Bikas Saha created YARN-1372:
--------------------------------

             Summary: Ensure all completed containers are reported to the AMs 
across RM restart
                 Key: YARN-1372
                 URL: https://issues.apache.org/jira/browse/YARN-1372
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Bikas Saha


Currently the NM informs the RM about completed containers and then removes 
those containers from the RM notification list. The RM passes on that completed 
container information to the AM and the AM pulls this data. If the RM dies 
before the AM pulls this data then the AM may not be able to get this 
information again. To fix this, NM should maintain a separate list of such 
completed container notifications sent to the RM. After the AM has pulled the 
containers from the RM then the RM will inform the NM about it and the NM can 
remove the completed container from the new list. Upon re-register with the RM 
(after RM restart) the NM should send the entire list of completed containers 
to the RM along with any other containers that completed while the RM was dead. 
This ensures that the RM can inform the AM's about all completed containers. 
Some container completions may be reported more than once since the AM may have 
pulled the container but the RM may die before notifying the NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to