Thanks, I'm trying to prevent the case where the TASK_LOST is issued to the framework while the task is still running on the slave. This happened during a network partition where the slave got deregistered. Until the slave came back and killed the tasks, they were marked as LOST and rescheduled again in a different slave. I'd like to prevent having two running at the same time.
On Tue, Jan 19, 2016 at 12:33 PM, Vinod Kone <[email protected]> wrote: > Killing is done by the agent/slave. So network partition doesn't affect > the killing. When the agent eventually connects with the master or times > out, TASK_LOST is sent to the framework. > > @vinodkone > > > On Jan 19, 2016, at 6:46 AM, Mauricio Garavaglia < > [email protected]> wrote: > > > > Hi, > > In the case of the --recover=cleanup option, acording to the docs it > "Kill any old live executors and exit". In the case of a network partition > that prevents the slave to reach the master, When does the killing of the > executors happen? > > > > Thanks > > >

