> > I'd like to prevent having two running at the same time.
I've expereince that as well, but it's hard to prevent in real world. Sometimes you can't even tell a network partition from other cases, e.g. the slave OS hangs due to a kernel bug, in which case the task is LOST and it's correct for the framework to relaunch the task. On Wed, Jan 20, 2016 at 12:09 AM, Mauricio Garavaglia <[email protected] > wrote: > Thanks, I'm trying to prevent the case where the TASK_LOST is issued to > the framework while the task is still running on the slave. This happened > during a network partition where the slave got deregistered. Until the > slave came back and killed the tasks, they were marked as LOST and > rescheduled again in a different slave. I'd like to prevent having two > running at the same time. > > On Tue, Jan 19, 2016 at 12:33 PM, Vinod Kone <[email protected]> wrote: > >> Killing is done by the agent/slave. So network partition doesn't affect >> the killing. When the agent eventually connects with the master or times >> out, TASK_LOST is sent to the framework. >> >> @vinodkone >> >> > On Jan 19, 2016, at 6:46 AM, Mauricio Garavaglia < >> [email protected]> wrote: >> > >> > Hi, >> > In the case of the --recover=cleanup option, acording to the docs it >> "Kill any old live executors and exit". In the case of a network partition >> that prevents the slave to reach the master, When does the killing of the >> executors happen? >> > >> > Thanks >> > >> > >

