[jira] [Commented] (YARN-128) Resurrect RM Restart

Thomas Graves (JIRA) Mon, 24 Sep 2012 18:04:10 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462324#comment-13462324
 ]


Thomas Graves commented on YARN-128:
------------------------------------

{quote}
RM sends commands back to clean up containers/applications. Can orphans be left 
behind on nodes after RM restart? Will NM be able to auto-clean containers?
{quote}

Containers can currently be lost. See YARN-72 and YARN-73. Once its changed so 
RM doesn't always reboot the NM's that will get a bit better but its still 
possible so we will have to handle somehow.  Since the NM could crash it almost 
needs a way to check on startup whats running and at that point decide if it 
should clean them up. It does have a .pid file for the containers but you would 
have to be sure that process is the same one as when the NM went down.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-128) Resurrect RM Restart

Reply via email to