Thanks James. I'll update the JIRA with our names and start with some prototype.
On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jpe...@apache.org> wrote: > > > > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > > In our environment, we run a lot of batch jobs, some of which have tight > timeline. If any tasks in the job runs longer than x hours, it does not > make sense to run it anymore. > > > > For instance, a team would submit a job which builds a weekly index and > repeats every Monday. If the job does not finish before next Monday for > whatever reason, there is no point to keep any task running. > > > > We believe that implementing deadline tracking distributed across our > cluster makes more sense as it makes the system more scalable and also > makes our centralized state machine simpler. > > > > One idea I have right now is to add an optional TimeInfo deadline to > TaskInfo field, and all default executors in Mesos can simply terminate the > task and send a proper StatusUpdate. > > > > I summarized above idea in MESOS-8725. > > > > Please let me know what you think. Thanks! > > This sounds both useful and simple to implement. I’m happy to shepherd if > you’d like > > J -- Cheers, Zhitao Li