>within 75s, slave restarts occasionally success, and sometimes failed with message ' Slave asked to shut down by master because 'health check timed out'.
Could you provide the master/slave log about this? I check current code, seems only slave_ping_timeout and max_slave_ping_timeouts would affect 'health check timed out'. On Fri, Aug 7, 2015 at 5:32 PM, sujz <[email protected]> wrote: > Hi all, > I ran into this problem with these steps: > 1: Start master and slave successfully. > 2: Stop slave by pressing Ctrl+C. > 3: After stopping slave, restart slave within 75s, it prompts this error: > Slave asked to shut down by master because 'health check timed out'. > > After reading the code and searching on the internet, I knew that after > slave being observed disconnected, master continues to send > PingSlaveMessage for MAX_SLAVE_PING_TIMEOUTS times, during each time > waiting for pong message from slave for SLAVE_PING_TIMEOUT, so the total > waiting time is > MAX_SLAVE_PING_TIMEOUTS * SLAVE_PING_TIMEOUT=75s, if slave restarts within > 75s master believes this slave failsover and accepts re-register request, > otherwise, removes this slave. > > For my situation, within 75s, slave restarts occasionally success, and > sometimes failed with message ' Slave asked to shut down by master because > 'health check timed out'. I found the bug report MESOS-2679 also > discussing this problem, but I think it didn't explain what master checks > and what factors may causes health check timed out, could anybody be kind > to give more explaination? So we can get away from 'health check timed out'. > > Thanks very much and best regards! -- Best Regards, Haosdent Huang

