[
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462324#comment-13462324
]
Thomas Graves commented on YARN-128:
------------------------------------
{quote}
RM sends commands back to clean up containers/applications. Can orphans be left
behind on nodes after RM restart? Will NM be able to auto-clean containers?
{quote}
Containers can currently be lost. See YARN-72 and YARN-73. Once its changed so
RM doesn't always reboot the NM's that will get a bit better but its still
possible so we will have to handle somehow. Since the NM could crash it almost
needs a way to check on startup whats running and at that point decide if it
should clean them up. It does have a .pid file for the containers but you would
have to be sure that process is the same one as when the NM went down.
> Resurrect RM Restart
> ---------------------
>
> Key: YARN-128
> URL: https://issues.apache.org/jira/browse/YARN-128
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
> Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM
> refactor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira