Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Gilbert Song
Meng,

Could you double check if this is really an issue in Mesos 1.5.0 release?

MESOS-1720  was resolved
after the 1.5 release (rc-2) and it seems like
it is only at the master branch and 1.5.x branch (not 1.5.0).

Did I miss anything?

- Gilbert

On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler  wrote:

> Put another way, we currently don't guarantee in-order task delivery to
> the executor. Due to the changes for MESOS-1720, one special case of task
> re-ordering now leads to the re-ordered task being dropped (rather than
> delivered out-of-order as before). Technically, this is strictly better.
>
> However, we'd like to start guaranteeing in-order task delivery.
>
> On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
>
>> Hi all:
>>
>> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
>> if all three conditions are met:
>> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>>  use the same executor.
>> (2) The executor currently does not exist on the agent.
>> (3) Due to some race conditions, these tasks are trying to launch
>> on the agent in a different order from their original launch order.
>>
>> In this case, tasks that are trying to launch on the agent
>> before the first task in the original order will be explicitly dropped by
>> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>>
>> This bug will be fixed in 1.5.1. It is tracked in
>> https://issues.apache.org/jira/browse/MESOS-8624
>>
>> 
>>
>> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
>> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
>> calls to a new executor. The master would specify that the first call is
>> the
>> one to launch a new executor through the `launch_executor` field in
>> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
>> use the existing executor launched by the first one.
>>
>> On the agent side, running a task/task group goes through a series of
>> continuations, one is `collect()` on the future that unschedule
>> frameworks from
>> being GC'ed:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
>> another is `collect()` on task authorization:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
>> Since these `collect()` calls run on individual actors, the futures of the
>> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
>> out-of-order, even if the futures these two `collect()` wait for are
>> satisfied in
>> order (which is true in these two cases).
>>
>> As a result, under some race conditions (probably under some heavy load
>> conditions), tasks rely on the previous task to launch executor may
>> get processed before the task that is supposed to launch the executor
>> first, resulting in the tasks being explicitly dropped by the agent.
>>
>> -Meng
>>
>>
>>
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Benjamin Mahler
Put another way, we currently don't guarantee in-order task delivery to the
executor. Due to the changes for MESOS-1720, one special case of task
re-ordering now leads to the re-ordered task being dropped (rather than
delivered out-of-order as before). Technically, this is strictly better.

However, we'd like to start guaranteeing in-order task delivery.

On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:

> Hi all:
>
> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> if all three conditions are met:
> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>  use the same executor.
> (2) The executor currently does not exist on the agent.
> (3) Due to some race conditions, these tasks are trying to launch
> on the agent in a different order from their original launch order.
>
> In this case, tasks that are trying to launch on the agent
> before the first task in the original order will be explicitly dropped by
> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>
> This bug will be fixed in 1.5.1. It is tracked in
> https://issues.apache.org/jira/browse/MESOS-8624
>
> 
>
> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor. The master would specify that the first call is
> the
> one to launch a new executor through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> use the existing executor launched by the first one.
>
> On the agent side, running a task/task group goes through a series of
> continuations, one is `collect()` on the future that unschedule
> frameworks from
> being GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> out-of-order, even if the futures these two `collect()` wait for are
> satisfied in
> order (which is true in these two cases).
>
> As a result, under some race conditions (probably under some heavy load
> conditions), tasks rely on the previous task to launch executor may
> get processed before the task that is supposed to launch the executor
> first, resulting in the tasks being explicitly dropped by the agent.
>
> -Meng
>
>
>


Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Meng Zhu
Hi all:

TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
if all three conditions are met:
(1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
 use the same executor.
(2) The executor currently does not exist on the agent.
(3) Due to some race conditions, these tasks are trying to launch
on the agent in a different order from their original launch order.

In this case, tasks that are trying to launch on the agent
before the first task in the original order will be explicitly dropped by
the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).

This bug will be fixed in 1.5.1. It is tracked in
https://issues.apache.org/jira/browse/MESOS-8624



In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
calls to a new executor. The master would specify that the first call is the
one to launch a new executor through the `launch_executor` field in
`RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
use the existing executor launched by the first one.

On the agent side, running a task/task group goes through a series of
continuations, one is `collect()` on the future that unschedule frameworks
from
being GC'ed:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
another is `collect()` on task authorization:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
Since these `collect()` calls run on individual actors, the futures of the
`collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
out-of-order, even if the futures these two `collect()` wait for are
satisfied in
order (which is true in these two cases).

As a result, under some race conditions (probably under some heavy load
conditions), tasks rely on the previous task to launch executor may
get processed before the task that is supposed to launch the executor
first, resulting in the tasks being explicitly dropped by the agent.

-Meng


Re: Anyone using a custom Sorter?

2018-03-01 Thread Jie Yu
if your intention is to kill sorter interface, i am +100

On Wed, Feb 28, 2018 at 2:12 PM, Michael Park  wrote:

> I'm not even sure if anyone's using a custom Allocator, but
> is anyone using a custom Sorter? It doesn't seem like there's
> even a module for it so it wouldn't be dynamically loaded.
>
> Perhaps you have a fork with a custom Sorter?
>
> Please let me know,
>
> Thanks!
>
> MPark
>