[
https://issues.apache.org/jira/browse/YARN-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli resolved YARN-73.
-----------------------------------------
Resolution: Duplicate
With YARN-495 in, we changed NM reboot behaviour to be a simple resync - kill
all containers and re-register with RM.
So in sum, YARN-72 cleans up containers on shutdown, YARN-495 does so on
resync.
There is still case when operator issues a shutdown but because
NM_SLEEP_DELAY_BEFORE_SIGKILL_MS + NM_PROCESS_KILL_WAIT_MS +
SHUTDOWN_CLEANUP_SLOP_MS is not enough to cleanup all containers. We can make
the later configurable or can mandate operators to kill containers explicitly
in that case.
Closing this as a duplicate.
> nodemanager should cleanup running containers when it starts
> ------------------------------------------------------------
>
> Key: YARN-73
> URL: https://issues.apache.org/jira/browse/YARN-73
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 0.23.3
> Reporter: Thomas Graves
>
> Currently the nodemanager doesn't cleanup running containers when it gets
> restarted. This can cause containers to get lost and stick around forever.
> We've seen this happen multiple times when the RM is restarted. When the RM
> is brought back up, it doesn't know about what was running on the cluster, it
> tells the NMs to reboot and when the NM reboots it loses what it had running.
> If there are any containers that are behaving badly there is no one left that
> knows about them to kill them.
> We should kill any running containers when the nodemanager is being started.
> Note that when the NM is being brought up it needs to somehow figure out what
> containers were running and be sure it doesn't kill anything it shouldn't.
> Note, we should also try to kill any running containers when the node manager
> is shutting down (jira 4213 was filed for this).
> This might change a bit when RM restart is implemented if tasks can actually
> survive across RM/NM being rebooted, but that can be addressed at that point.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira