[
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617028#comment-13617028
]
jian he commented on YARN-495:
------------------------------
When a reboot command is sent from RM, the node manager right now cleans up all
containers and does a complete reboot. We are thinking to change the behavior
of rebooting the whole NM to only let it resync with RM, essentially restart
the nodeStatusUpdater thread and reregister with RM.
The reason to do this is that rebooting the whole piece may be an overhead
work. Other services may not need to be rebooted. As long as RM is restarted
and ensured that it has the same state as it is before restart. The thing
matters here is to sync NM and RM such that they are also on the same page as
they are before restart, just by killing containers and reregister.
And in the future work of RM work-preserving restart, containers should not be
all cleaned. For example , keeping track of the previously running containers
and when NM receives a resync command, continue what its running before.
> Containers are not terminated when the NM is rebooted
> -----------------------------------------------------
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: jian he
> Assignee: jian he
> Attachments: YARN-495.1.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the
> containers while its stopping.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira