Greg Mann created MESOS-9944: -------------------------------- Summary: Command health check timeout begins to early Key: MESOS-9944 URL: https://issues.apache.org/jira/browse/MESOS-9944 Project: Mesos Issue Type: Bug Components: agent Affects Versions: 1.9.0 Reporter: Greg Mann
The checker process begins the timer for the command health check timeout when the LAUNCH_NESTED_CONTAINER_SESSION request is first sent, which means any delay in the execution of the health check command is included in the health check timeout. This can be an issue when the agent is under heavy load, and it may take a few seconds for the health check command to be run. Once we have a streaming response for the ATTACH_CONTAINER_OUTPUT call which follows the nested container launch, we can initiate the health check timeout once the first byte of the response is received; this is a more accurate signal that the health check command has begun running. -- This message was sent by Atlassian Jira (v8.3.2#803003)