[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted

jian he (JIRA) Thu, 28 Mar 2013 20:15:21 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617028#comment-13617028
 ]


jian he commented on YARN-495:
------------------------------

When a reboot command is sent from RM, the node manager right now cleans up all 
containers and does a complete reboot.  We are thinking to change the behavior 
of rebooting the whole NM to only let it resync with RM, essentially restart 
the nodeStatusUpdater thread and reregister with RM. 
The reason to do this is that rebooting the whole piece may be an overhead 
work. Other services may not need to be rebooted. As long as RM is restarted 
and ensured that it has the same state as it is before restart. The thing 
matters here is to sync NM and RM such that they are also on the same page as 
they are before restart, just by killing containers and reregister. 
And in the future work of RM work-preserving restart, containers should not be 
all cleaned. For example , keeping track of the previously running containers 
and when NM receives a resync command, continue what its running before.
                
> Containers are not terminated when the NM is rebooted
> -----------------------------------------------------
>
>                 Key: YARN-495
>                 URL: https://issues.apache.org/jira/browse/YARN-495
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: jian he
>            Assignee: jian he
>         Attachments: YARN-495.1.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted

Reply via email to