Anubhav Dhoot updated YARN-556:

    Attachment: YARN-1372.prelim.patch

NM does not remove completedContainers from its list until RM sends a new field 
in the nodeheartbeatresponse which tracks containerCompletions acked by the AM.
RM AppAttempt tracks completed container to nodeid, This is sents to AM and 
after AM sends the next allocate its assumed to implicitly ack the previous , 
RMNode gets a new event to process this ack and send it to NM via the 
heartbeatresponse completing the cycle.

> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.

This message was sent by Atlassian JIRA

Reply via email to