Re: Understanding Slave Recovery Timeouts

Vinod Kone Fri, 19 Jun 2015 15:48:11 -0700

> *If* the 75 seconds is exceeded but we're within the recovery_timeout,
> the slave *should* register with a new slave ID. The slave daemon (with
> the new slave ID) reconnects to the old executors and updates them to use
> the new slave ID.
>


This is not true. 'recovery_timeout' was added to make sure that if a slave
is down for a long time (>10 mins), the executors commit suicide. It is
better for the executor/task to die than keep running because the framework
might have already launched another replica of that instance. This was not
tied to the 75s timeout (hard coded) because it is possible for a slave to
successfully re-register with a master after 75s (e.g., both master and
slave are down for 5 min).

Also, a slave cannot connect to old executors with a new slave id.

HTH,

Re: Understanding Slave Recovery Timeouts

Reply via email to