> On Mar 22, 2018, at 10:06 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > In our environment, we run a lot of batch jobs, some of which have tight > timeline. If any tasks in the job runs longer than x hours, it does not make > sense to run it anymore. > > For instance, a team would submit a job which builds a weekly index and > repeats every Monday. If the job does not finish before next Monday for > whatever reason, there is no point to keep any task running. > > We believe that implementing deadline tracking distributed across our cluster > makes more sense as it makes the system more scalable and also makes our > centralized state machine simpler. > > One idea I have right now is to add an optional TimeInfo deadline to > TaskInfo field, and all default executors in Mesos can simply terminate the > task and send a proper StatusUpdate. > > I summarized above idea in MESOS-8725. > > Please let me know what you think. Thanks!
This sounds both useful and simple to implement. I’m happy to shepherd if you’d like J