Thanks, I'm trying to prevent the case where the TASK_LOST is issued to the
framework while the task is still running on the slave. This happened
during a network partition where the slave got deregistered. Until the
slave came back and killed the tasks, they were marked as LOST and
rescheduled again in a different slave. I'd like to prevent having two
running at the same time.

On Tue, Jan 19, 2016 at 12:33 PM, Vinod Kone <[email protected]> wrote:

> Killing is done by the agent/slave. So network partition doesn't affect
> the killing. When the agent eventually connects with the master or times
> out, TASK_LOST is sent to the framework.
>
> @vinodkone
>
> > On Jan 19, 2016, at 6:46 AM, Mauricio Garavaglia <
> [email protected]> wrote:
> >
> > Hi,
> > In the case of the --recover=cleanup option, acording to the docs it
> "Kill any old live executors and exit". In the case of a network partition
> that prevents the slave to reach the master, When does the killing of the
> executors happen?
> >
> > Thanks
> >
>

Reply via email to