Hi Zhitao, Since this is something that could potentially be handled by the executor and/or framework, I was wondering if you could speak to the advantages of making this a TaskInfo primitive vs having the executor (or even the framework) handle it.
-Renan On Fri, Mar 23, 2018 at 9:19 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > Thanks James. I'll update the JIRA with our names and start with some > prototype. > > On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jpe...@apache.org> wrote: > >> >> >> > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: >> > >> > In our environment, we run a lot of batch jobs, some of which have >> tight timeline. If any tasks in the job runs longer than x hours, it does >> not make sense to run it anymore. >> > >> > For instance, a team would submit a job which builds a weekly index and >> repeats every Monday. If the job does not finish before next Monday for >> whatever reason, there is no point to keep any task running. >> > >> > We believe that implementing deadline tracking distributed across our >> cluster makes more sense as it makes the system more scalable and also >> makes our centralized state machine simpler. >> > >> > One idea I have right now is to add an optional TimeInfo deadline to >> TaskInfo field, and all default executors in Mesos can simply terminate the >> task and send a proper StatusUpdate. >> > >> > I summarized above idea in MESOS-8725. >> > >> > Please let me know what you think. Thanks! >> >> This sounds both useful and simple to implement. I’m happy to shepherd if >> you’d like >> >> J > > > > > -- > Cheers, > > Zhitao Li >