Re: Support deadline for tasks

2018-03-26 Thread David Morrison
Hi, Benjamin,

Usually for us if tasks run longer than a certain period of time it means
that something has gone wrong and we should just abort/try again.

David (also at Yelp)

On Fri, Mar 23, 2018 at 7:14 PM, Benjamin Mahler  wrote:

> Ah, I was more curious about why they need to be killed after a timeout.
> E.g. After a particular deadline the work is useless (in Zhitao's case).
>
> On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan 
> wrote:
>
>> Hi Benjamin,
>> We have a few tasks that should be killed after
>> some timeout. We currently have some logic in our scheduler to kill these
>> tasks. Would be nice to delegate this to the executor.
>>
>> - Sagar
>>
>> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler 
>> wrote:
>>
>> > Sagar, could you share your use case? Or is it exactly the same as
>> > Zhitao's?
>> >
>> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
>> > sag...@yelp.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > This will be useful for us(Yelp) as well.
>> > >
>> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler 
>> > > wrote:
>> > >
>> > > > Also, it's advantageous for mesos to be aware of a hard deadline
>> when
>> > it
>> > > > comes to resource allocation. We know that some resources will free
>> up
>> > > and
>> > > > can make better decisions when it comes to pre-emption, for example.
>> > > > Currently, mesos doesn't know if a task will run forever or will
>> run to
>> > > > completion.
>> > > >
>> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach 
>> > wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
>> > > renanidelva...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Hi Zhitao,
>> > > > > >
>> > > > > > Since this is something that could potentially be handled by the
>> > > > > executor and/or framework, I was wondering if you could speak to
>> the
>> > > > > advantages of making this a TaskInfo primitive vs having the
>> executor
>> > > (or
>> > > > > even the framework) handle it.
>> > > > >
>> > > > > There's some discussion around this on https://issues.apache.org/
>> > > > > jira/browse/MESOS-8725.
>> > > > >
>> > > > > My take is that delegating too much to the scheduler makes
>> schedulers
>> > > > > harder to write and exacerbates the complexity of the system. If 4
>> > > > > different schedulers implement this feature, operators are likely
>> to
>> > > need
>> > > > > to understand 4 different ways of doing the same thing, which
>> would
>> > be
>> > > > > unfortunate.
>> > > > >
>> > > > > J
>> > > >
>> > >
>> >
>>
>


Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Ah, I was more curious about why they need to be killed after a timeout.
E.g. After a particular deadline the work is useless (in Zhitao's case).

On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan 
wrote:

> Hi Benjamin,
> We have a few tasks that should be killed after
> some timeout. We currently have some logic in our scheduler to kill these
> tasks. Would be nice to delegate this to the executor.
>
> - Sagar
>
> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler 
> wrote:
>
> > Sagar, could you share your use case? Or is it exactly the same as
> > Zhitao's?
> >
> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
> > sag...@yelp.com>
> > wrote:
> >
> > > +1
> > >
> > > This will be useful for us(Yelp) as well.
> > >
> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler 
> > > wrote:
> > >
> > > > Also, it's advantageous for mesos to be aware of a hard deadline when
> > it
> > > > comes to resource allocation. We know that some resources will free
> up
> > > and
> > > > can make better decisions when it comes to pre-emption, for example.
> > > > Currently, mesos doesn't know if a task will run forever or will run
> to
> > > > completion.
> > > >
> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach 
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> > > renanidelva...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Zhitao,
> > > > > >
> > > > > > Since this is something that could potentially be handled by the
> > > > > executor and/or framework, I was wondering if you could speak to
> the
> > > > > advantages of making this a TaskInfo primitive vs having the
> executor
> > > (or
> > > > > even the framework) handle it.
> > > > >
> > > > > There's some discussion around this on https://issues.apache.org/
> > > > > jira/browse/MESOS-8725.
> > > > >
> > > > > My take is that delegating too much to the scheduler makes
> schedulers
> > > > > harder to write and exacerbates the complexity of the system. If 4
> > > > > different schedulers implement this feature, operators are likely
> to
> > > need
> > > > > to understand 4 different ways of doing the same thing, which would
> > be
> > > > > unfortunate.
> > > > >
> > > > > J
> > > >
> > >
> >
>


Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Sagar, could you share your use case? Or is it exactly the same as Zhitao's?

On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan 
wrote:

> +1
>
> This will be useful for us(Yelp) as well.
>
> On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler 
> wrote:
>
> > Also, it's advantageous for mesos to be aware of a hard deadline when it
> > comes to resource allocation. We know that some resources will free up
> and
> > can make better decisions when it comes to pre-emption, for example.
> > Currently, mesos doesn't know if a task will run forever or will run to
> > completion.
> >
> > On Fri, Mar 23, 2018 at 10:07 AM, James Peach  wrote:
> >
> > >
> > >
> > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> renanidelva...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Zhitao,
> > > >
> > > > Since this is something that could potentially be handled by the
> > > executor and/or framework, I was wondering if you could speak to the
> > > advantages of making this a TaskInfo primitive vs having the executor
> (or
> > > even the framework) handle it.
> > >
> > > There's some discussion around this on https://issues.apache.org/
> > > jira/browse/MESOS-8725.
> > >
> > > My take is that delegating too much to the scheduler makes schedulers
> > > harder to write and exacerbates the complexity of the system. If 4
> > > different schedulers implement this feature, operators are likely to
> need
> > > to understand 4 different ways of doing the same thing, which would be
> > > unfortunate.
> > >
> > > J
> >
>


Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Also, it's advantageous for mesos to be aware of a hard deadline when it
comes to resource allocation. We know that some resources will free up and
can make better decisions when it comes to pre-emption, for example.
Currently, mesos doesn't know if a task will run forever or will run to
completion.

On Fri, Mar 23, 2018 at 10:07 AM, James Peach  wrote:

>
>
> > On Mar 23, 2018, at 9:57 AM, Renan DelValle 
> wrote:
> >
> > Hi Zhitao,
> >
> > Since this is something that could potentially be handled by the
> executor and/or framework, I was wondering if you could speak to the
> advantages of making this a TaskInfo primitive vs having the executor (or
> even the framework) handle it.
>
> There's some discussion around this on https://issues.apache.org/
> jira/browse/MESOS-8725.
>
> My take is that delegating too much to the scheduler makes schedulers
> harder to write and exacerbates the complexity of the system. If 4
> different schedulers implement this feature, operators are likely to need
> to understand 4 different ways of doing the same thing, which would be
> unfortunate.
>
> J


Re: Support deadline for tasks

2018-03-23 Thread James Peach


> On Mar 23, 2018, at 9:57 AM, Renan DelValle  wrote:
> 
> Hi Zhitao,
> 
> Since this is something that could potentially be handled by the executor 
> and/or framework, I was wondering if you could speak to the advantages of 
> making this a TaskInfo primitive vs having the executor (or even the 
> framework) handle it.

There's some discussion around this on 
https://issues.apache.org/jira/browse/MESOS-8725.

My take is that delegating too much to the scheduler makes schedulers harder to 
write and exacerbates the complexity of the system. If 4 different schedulers 
implement this feature, operators are likely to need to understand 4 different 
ways of doing the same thing, which would be unfortunate. 

J

Re: Support deadline for tasks

2018-03-23 Thread Renan DelValle
Hi Zhitao,

Since this is something that could potentially be handled by the executor
and/or framework, I was wondering if you could speak to the advantages of
making this a TaskInfo primitive vs having the executor (or even the
framework) handle it.

-Renan


On Fri, Mar 23, 2018 at 9:19 AM, Zhitao Li  wrote:

> Thanks James. I'll update the JIRA with our names and start with some
> prototype.
>
> On Thu, Mar 22, 2018 at 9:07 PM, James Peach  wrote:
>
>>
>>
>> > On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
>> >
>> > In our environment, we run a lot of batch jobs, some of which have
>> tight timeline. If any tasks in the job runs longer than x hours, it does
>> not make sense to run it anymore.
>> >
>> > For instance, a team would submit a job which builds a weekly index and
>> repeats every Monday. If the job does not finish before next Monday for
>> whatever reason, there is no point to keep any task running.
>> >
>> > We believe that implementing deadline tracking distributed across our
>> cluster makes more sense as it makes the system more scalable and also
>> makes our centralized state machine simpler.
>> >
>> > One idea I have right now is to add an  optional TimeInfo deadline to
>> TaskInfo field, and all default executors in Mesos can simply terminate the
>> task and send a proper StatusUpdate.
>> >
>> > I summarized above idea in MESOS-8725.
>> >
>> > Please let me know what you think. Thanks!
>>
>> This sounds both useful and simple to implement. I’m happy to shepherd if
>> you’d like
>>
>> J
>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Support deadline for tasks

2018-03-23 Thread Zhitao Li
Thanks James. I'll update the JIRA with our names and start with some
prototype.

On Thu, Mar 22, 2018 at 9:07 PM, James Peach  wrote:

>
>
> > On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
> >
> > In our environment, we run a lot of batch jobs, some of which have tight
> timeline. If any tasks in the job runs longer than x hours, it does not
> make sense to run it anymore.
> >
> > For instance, a team would submit a job which builds a weekly index and
> repeats every Monday. If the job does not finish before next Monday for
> whatever reason, there is no point to keep any task running.
> >
> > We believe that implementing deadline tracking distributed across our
> cluster makes more sense as it makes the system more scalable and also
> makes our centralized state machine simpler.
> >
> > One idea I have right now is to add an  optional TimeInfo deadline to
> TaskInfo field, and all default executors in Mesos can simply terminate the
> task and send a proper StatusUpdate.
> >
> > I summarized above idea in MESOS-8725.
> >
> > Please let me know what you think. Thanks!
>
> This sounds both useful and simple to implement. I’m happy to shepherd if
> you’d like
>
> J




-- 
Cheers,

Zhitao Li


Re: Support deadline for tasks

2018-03-22 Thread James Peach


> On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
> 
> In our environment, we run a lot of batch jobs, some of which have tight 
> timeline. If any tasks in the job runs longer than x hours, it does not make 
> sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and 
> repeats every Monday. If the job does not finish before next Monday for 
> whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster 
> makes more sense as it makes the system more scalable and also makes our 
> centralized state machine simpler.
>  
> One idea I have right now is to add an  optional TimeInfo deadline to 
> TaskInfo field, and all default executors in Mesos can simply terminate the 
> task and send a proper StatusUpdate.
> 
> I summarized above idea in MESOS-8725.
> 
> Please let me know what you think. Thanks! 

This sounds both useful and simple to implement. I’m happy to shepherd if you’d 
like

J

Support deadline for tasks

2018-03-22 Thread Zhitao Li
In our environment, we run a lot of batch jobs, some of which have tight
timeline. If any tasks in the job runs longer than x hours, it does not
make sense to run it anymore.

For instance, a team would submit a job which builds a weekly index and
repeats every Monday. If the job does not finish before next Monday for
whatever reason, there is no point to keep any task running.

We believe that implementing deadline tracking distributed across our
cluster makes more sense as it makes the system more scalable and also
makes our centralized state machine simpler.

One idea I have right now is to add an  *optional* *TimeInfo deadline* to
TaskInfo field, and all default executors in Mesos can simply terminate the
task and send a proper *StatusUpdate.*

I summarized above idea in MESOS-8725
.

Please let me know what you think. Thanks!

-- 
Cheers,

Zhitao Li