Andrei Budnik created MESOS-10158:
-------------------------------------

             Summary: Mesos Agent gets stuck in Draining due to pending 
unacknowledged status updates
                 Key: MESOS-10158
                 URL: https://issues.apache.org/jira/browse/MESOS-10158
             Project: Mesos
          Issue Type: Bug
          Components: master
            Reporter: Andrei Budnik


A Mesos agent can get stuck in the Draining mode caused by pending 
unacknowledged status updates. When the framework becomes disconnected, the 
agent keeps sending task status updates for terminated tasks of that framework. 
This leads to a problem when the agent gets stuck in the Draining state because 
the master transitions the agent from DRAINING to DRAINED state only after all 
task status updates get acknowledged.

This problem can be resolved by sending ["Teardown" 
operation|https://github.com/apache/mesos/blob/8ce5d30808f3744eeded09d530f226079d569a94/include/mesos/v1/master/master.proto#L299-L303]
 for all lost frameworks. However, it would be much better if this situation 
could be handled automatically by the Master. At least, we should make it 
easier for an operator to find out what prevents draining operation to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to