Andrei Budnik created MESOS-10158: ------------------------------------- Summary: Mesos Agent gets stuck in Draining due to pending unacknowledged status updates Key: MESOS-10158 URL: https://issues.apache.org/jira/browse/MESOS-10158 Project: Mesos Issue Type: Bug Components: master Reporter: Andrei Budnik
A Mesos agent can get stuck in the Draining mode caused by pending unacknowledged status updates. When the framework becomes disconnected, the agent keeps sending task status updates for terminated tasks of that framework. This leads to a problem when the agent gets stuck in the Draining state because the master transitions the agent from DRAINING to DRAINED state only after all task status updates get acknowledged. This problem can be resolved by sending ["Teardown" operation|https://github.com/apache/mesos/blob/8ce5d30808f3744eeded09d530f226079d569a94/include/mesos/v1/master/master.proto#L299-L303] for all lost frameworks. However, it would be much better if this situation could be handled automatically by the Master. At least, we should make it easier for an operator to find out what prevents draining operation to complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)