Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-02 Thread Meng Zhu
CORRECTION:

This is a new behavior that only appears in the current 1.5.x branch. In
1.5.0, Mesos
agent still has the old behavior, namely, any reordered tasks (to the same
executor)
are launched regardless.

On Fri, Mar 2, 2018 at 9:41 AM, Chun-Hung Hsiao 
wrote:

> Gilbert I think you're right. The code path doesn't exist in 1.5.0.
>
> On Mar 2, 2018 9:36 AM, "Chun-Hung Hsiao"  wrote:
>
> > This is a new behavior we have after solving MESOS-1720, and thus a new
> > problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same
> executor)
> > will be launched because whoever comes first will launch the executor.
> > Since 1.5, one might be dropped.
> >
> > On Mar 1, 2018 4:36 PM, "Gilbert Song"  wrote:
> >
> >> Meng,
> >>
> >> Could you double check if this is really an issue in Mesos 1.5.0
> release?
> >>
> >> MESOS-1720  was
> >> resolved
> >> after the 1.5 release (rc-2) and it seems like
> >> it is only at the master branch and 1.5.x branch (not 1.5.0).
> >>
> >> Did I miss anything?
> >>
> >> - Gilbert
> >>
> >> On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler 
> >> wrote:
> >>
> >> > Put another way, we currently don't guarantee in-order task delivery
> to
> >> > the executor. Due to the changes for MESOS-1720, one special case of
> >> task
> >> > re-ordering now leads to the re-ordered task being dropped (rather
> than
> >> > delivered out-of-order as before). Technically, this is strictly
> better.
> >> >
> >> > However, we'd like to start guaranteeing in-order task delivery.
> >> >
> >> > On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
> >> >
> >> >> Hi all:
> >> >>
> >> >> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> >> >> if all three conditions are met:
> >> >> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
> >> >>  use the same executor.
> >> >> (2) The executor currently does not exist on the agent.
> >> >> (3) Due to some race conditions, these tasks are trying to launch
> >> >> on the agent in a different order from their original launch order.
> >> >>
> >> >> In this case, tasks that are trying to launch on the agent
> >> >> before the first task in the original order will be explicitly
> dropped
> >> by
> >> >> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
> >> >>
> >> >> This bug will be fixed in 1.5.1. It is tracked in
> >> >> https://issues.apache.org/jira/browse/MESOS-8624
> >> >>
> >> >> 
> >> >>
> >> >> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced
> an
> >> >> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> >> >> calls to a new executor. The master would specify that the first call
> >> is
> >> >> the
> >> >> one to launch a new executor through the `launch_executor` field in
> >> >> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> >> >> use the existing executor launched by the first one.
> >> >>
> >> >> On the agent side, running a task/task group goes through a series of
> >> >> continuations, one is `collect()` on the future that unschedule
> >> >> frameworks from
> >> >> being GC'ed:
> >> >> https://github.com/apache/mesos/blob/master/src/slave/slave.
> cpp#L2158
> >> >> another is `collect()` on task authorization:
> >> >> https://github.com/apache/mesos/blob/master/src/slave/slave.
> cpp#L2333
> >> >> Since these `collect()` calls run on individual actors, the futures
> of
> >> the
> >> >> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> >> >> out-of-order, even if the futures these two `collect()` wait for are
> >> >> satisfied in
> >> >> order (which is true in these two cases).
> >> >>
> >> >> As a result, under some race conditions (probably under some heavy
> load
> >> >> conditions), tasks rely on the previous task to launch executor may
> >> >> get processed before the task that is supposed to launch the executor
> >> >> first, resulting in the tasks being explicitly dropped by the agent.
> >> >>
> >> >> -Meng
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-02 Thread Chun-Hung Hsiao
Gilbert I think you're right. The code path doesn't exist in 1.5.0.

On Mar 2, 2018 9:36 AM, "Chun-Hung Hsiao"  wrote:

> This is a new behavior we have after solving MESOS-1720, and thus a new
> problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same executor)
> will be launched because whoever comes first will launch the executor.
> Since 1.5, one might be dropped.
>
> On Mar 1, 2018 4:36 PM, "Gilbert Song"  wrote:
>
>> Meng,
>>
>> Could you double check if this is really an issue in Mesos 1.5.0 release?
>>
>> MESOS-1720  was
>> resolved
>> after the 1.5 release (rc-2) and it seems like
>> it is only at the master branch and 1.5.x branch (not 1.5.0).
>>
>> Did I miss anything?
>>
>> - Gilbert
>>
>> On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler 
>> wrote:
>>
>> > Put another way, we currently don't guarantee in-order task delivery to
>> > the executor. Due to the changes for MESOS-1720, one special case of
>> task
>> > re-ordering now leads to the re-ordered task being dropped (rather than
>> > delivered out-of-order as before). Technically, this is strictly better.
>> >
>> > However, we'd like to start guaranteeing in-order task delivery.
>> >
>> > On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
>> >
>> >> Hi all:
>> >>
>> >> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
>> >> if all three conditions are met:
>> >> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>> >>  use the same executor.
>> >> (2) The executor currently does not exist on the agent.
>> >> (3) Due to some race conditions, these tasks are trying to launch
>> >> on the agent in a different order from their original launch order.
>> >>
>> >> In this case, tasks that are trying to launch on the agent
>> >> before the first task in the original order will be explicitly dropped
>> by
>> >> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>> >>
>> >> This bug will be fixed in 1.5.1. It is tracked in
>> >> https://issues.apache.org/jira/browse/MESOS-8624
>> >>
>> >> 
>> >>
>> >> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
>> >> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
>> >> calls to a new executor. The master would specify that the first call
>> is
>> >> the
>> >> one to launch a new executor through the `launch_executor` field in
>> >> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
>> >> use the existing executor launched by the first one.
>> >>
>> >> On the agent side, running a task/task group goes through a series of
>> >> continuations, one is `collect()` on the future that unschedule
>> >> frameworks from
>> >> being GC'ed:
>> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
>> >> another is `collect()` on task authorization:
>> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
>> >> Since these `collect()` calls run on individual actors, the futures of
>> the
>> >> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
>> >> out-of-order, even if the futures these two `collect()` wait for are
>> >> satisfied in
>> >> order (which is true in these two cases).
>> >>
>> >> As a result, under some race conditions (probably under some heavy load
>> >> conditions), tasks rely on the previous task to launch executor may
>> >> get processed before the task that is supposed to launch the executor
>> >> first, resulting in the tasks being explicitly dropped by the agent.
>> >>
>> >> -Meng
>> >>
>> >>
>> >>
>> >
>>
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-02 Thread Chun-Hung Hsiao
This is a new behavior we have after solving MESOS-1720, and thus a new
problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same executor)
will be launched because whoever comes first will launch the executor.
Since 1.5, one might be dropped.

On Mar 1, 2018 4:36 PM, "Gilbert Song"  wrote:

> Meng,
>
> Could you double check if this is really an issue in Mesos 1.5.0 release?
>
> MESOS-1720  was resolved
> after the 1.5 release (rc-2) and it seems like
> it is only at the master branch and 1.5.x branch (not 1.5.0).
>
> Did I miss anything?
>
> - Gilbert
>
> On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler 
> wrote:
>
> > Put another way, we currently don't guarantee in-order task delivery to
> > the executor. Due to the changes for MESOS-1720, one special case of task
> > re-ordering now leads to the re-ordered task being dropped (rather than
> > delivered out-of-order as before). Technically, this is strictly better.
> >
> > However, we'd like to start guaranteeing in-order task delivery.
> >
> > On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
> >
> >> Hi all:
> >>
> >> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> >> if all three conditions are met:
> >> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
> >>  use the same executor.
> >> (2) The executor currently does not exist on the agent.
> >> (3) Due to some race conditions, these tasks are trying to launch
> >> on the agent in a different order from their original launch order.
> >>
> >> In this case, tasks that are trying to launch on the agent
> >> before the first task in the original order will be explicitly dropped
> by
> >> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
> >>
> >> This bug will be fixed in 1.5.1. It is tracked in
> >> https://issues.apache.org/jira/browse/MESOS-8624
> >>
> >> 
> >>
> >> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
> >> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> >> calls to a new executor. The master would specify that the first call is
> >> the
> >> one to launch a new executor through the `launch_executor` field in
> >> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> >> use the existing executor launched by the first one.
> >>
> >> On the agent side, running a task/task group goes through a series of
> >> continuations, one is `collect()` on the future that unschedule
> >> frameworks from
> >> being GC'ed:
> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> >> another is `collect()` on task authorization:
> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> >> Since these `collect()` calls run on individual actors, the futures of
> the
> >> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> >> out-of-order, even if the futures these two `collect()` wait for are
> >> satisfied in
> >> order (which is true in these two cases).
> >>
> >> As a result, under some race conditions (probably under some heavy load
> >> conditions), tasks rely on the previous task to launch executor may
> >> get processed before the task that is supposed to launch the executor
> >> first, resulting in the tasks being explicitly dropped by the agent.
> >>
> >> -Meng
> >>
> >>
> >>
> >
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Gilbert Song
Meng,

Could you double check if this is really an issue in Mesos 1.5.0 release?

MESOS-1720  was resolved
after the 1.5 release (rc-2) and it seems like
it is only at the master branch and 1.5.x branch (not 1.5.0).

Did I miss anything?

- Gilbert

On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler  wrote:

> Put another way, we currently don't guarantee in-order task delivery to
> the executor. Due to the changes for MESOS-1720, one special case of task
> re-ordering now leads to the re-ordered task being dropped (rather than
> delivered out-of-order as before). Technically, this is strictly better.
>
> However, we'd like to start guaranteeing in-order task delivery.
>
> On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
>
>> Hi all:
>>
>> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
>> if all three conditions are met:
>> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>>  use the same executor.
>> (2) The executor currently does not exist on the agent.
>> (3) Due to some race conditions, these tasks are trying to launch
>> on the agent in a different order from their original launch order.
>>
>> In this case, tasks that are trying to launch on the agent
>> before the first task in the original order will be explicitly dropped by
>> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>>
>> This bug will be fixed in 1.5.1. It is tracked in
>> https://issues.apache.org/jira/browse/MESOS-8624
>>
>> 
>>
>> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
>> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
>> calls to a new executor. The master would specify that the first call is
>> the
>> one to launch a new executor through the `launch_executor` field in
>> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
>> use the existing executor launched by the first one.
>>
>> On the agent side, running a task/task group goes through a series of
>> continuations, one is `collect()` on the future that unschedule
>> frameworks from
>> being GC'ed:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
>> another is `collect()` on task authorization:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
>> Since these `collect()` calls run on individual actors, the futures of the
>> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
>> out-of-order, even if the futures these two `collect()` wait for are
>> satisfied in
>> order (which is true in these two cases).
>>
>> As a result, under some race conditions (probably under some heavy load
>> conditions), tasks rely on the previous task to launch executor may
>> get processed before the task that is supposed to launch the executor
>> first, resulting in the tasks being explicitly dropped by the agent.
>>
>> -Meng
>>
>>
>>
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Benjamin Mahler
Put another way, we currently don't guarantee in-order task delivery to the
executor. Due to the changes for MESOS-1720, one special case of task
re-ordering now leads to the re-ordered task being dropped (rather than
delivered out-of-order as before). Technically, this is strictly better.

However, we'd like to start guaranteeing in-order task delivery.

On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:

> Hi all:
>
> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> if all three conditions are met:
> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>  use the same executor.
> (2) The executor currently does not exist on the agent.
> (3) Due to some race conditions, these tasks are trying to launch
> on the agent in a different order from their original launch order.
>
> In this case, tasks that are trying to launch on the agent
> before the first task in the original order will be explicitly dropped by
> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>
> This bug will be fixed in 1.5.1. It is tracked in
> https://issues.apache.org/jira/browse/MESOS-8624
>
> 
>
> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor. The master would specify that the first call is
> the
> one to launch a new executor through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> use the existing executor launched by the first one.
>
> On the agent side, running a task/task group goes through a series of
> continuations, one is `collect()` on the future that unschedule
> frameworks from
> being GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> out-of-order, even if the futures these two `collect()` wait for are
> satisfied in
> order (which is true in these two cases).
>
> As a result, under some race conditions (probably under some heavy load
> conditions), tasks rely on the previous task to launch executor may
> get processed before the task that is supposed to launch the executor
> first, resulting in the tasks being explicitly dropped by the agent.
>
> -Meng
>
>
>