Hi all,
I ran into this problem with these steps:
1: Start master and slave successfully.
2: Stop slave by pressing Ctrl+C.
3: After slave stopped, restart slave within 75s, it prompts this error: Slave 
asked to shut down by master because 'health check timed out'.

After reading the code and  searching on the internet, I knew that after slave 
being observed disconnected, master continuously to send PingSlaveMessage for 
MAX_SLAVE_PING_TIMEOUTS times, during each time waiting for pong message from 
slave for SLAVE_PING_TIMEOUT, so the total waiting time is 
MAX_SLAVE_PING_TIMEOUTS * SLAVE_PING_TIMEOUT=75s, if slave restarts within 75s, 
master believes this slave failsover and accepts re-register request, 
otherwise, removes this slave.

For my situation, within 75s, slave restarts occasionally  failed with message 
'Slave asked to shut down by master because 'health check timed out'.  I also 
found the bug report MESOS-2679  discussing this problem, but I think it didn't 
explain what master checks and what kinds of reasons may cause health check 
timed out, could anybody be kind to give more explaination? So I can restart 
slave successfully all the time.

Thanks very much and best regards!

Reply via email to