Hi all,
I ran into this problem with these steps:
1: Start master and slave successfully.
2: Stop slave by pressing Ctrl+C.
3: After stopping slave, restart slave within 75s, it prompts this error: Slave 
asked to shut down by master because 'health check timed out'.

After reading the code and  searching on the internet, I knew that after slave 
being observed disconnected, master continues to send PingSlaveMessage for 
MAX_SLAVE_PING_TIMEOUTS times, during each time waiting for pong message from 
slave for SLAVE_PING_TIMEOUT, so the total waiting time is 
MAX_SLAVE_PING_TIMEOUTS * SLAVE_PING_TIMEOUT=75s, if slave restarts within 75s 
master believes this slave failsover and accepts re-register request, 
otherwise, removes this slave.

For my situation, within 75s, slave restarts occasionally success, and 
sometimes failed with message ' Slave asked to shut down by master because 
'health check timed out'.  I found the bug report MESOS-2679 also discussing 
this problem, but I think it didn't explain what master checks and what factors 
may causes health check timed out, could anybody be kind to give more 
explaination? So we can get away from 'health check timed out'.

Thanks very much and best regards!

Reply via email to