[
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500195#comment-13500195
]
Tom White commented on YARN-72:
-------------------------------
Sandy, this looks like a good start, hooking in the code for container cleanup.
I would focus on the part to cleanup on shutdown in this patch, and tackle
cleanup on startup in YARN-73.
As Bikas mentioned there needs to be a timeout on waiting for the containers to
shutdown. The shutdown process waits for up to
yarn.nodemanager.process-kill-wait.ms for the PID to appear, then
yarn.nodemanager.sleep-delay-before-sigkill.ms before sending a SIGKILL signal
(after a SIGTERM) if the process hasn't died - see
ContainerLaunch#cleanupContainer. Waiting for a little longer than the sum of
these durations would be sufficient.
Regarding testing, you could have a test like the one in
TestContainerLaunch#testDelayedKill to test that containers are correctly
cleaned up after stopping a NM.
> NM should handle cleaning up containers when it shuts down ( and kill
> containers from an earlier instance when it comes back up after an unclean
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-72
> URL: https://issues.apache.org/jira/browse/YARN-72
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Hitesh Shah
> Assignee: Sandy Ryza
> Attachments: YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a
> shutdown signal for existing containers to complete and kill the containers (
> if we pick an aggressive approach ) after this time interval.
> For NMs which come up after an unclean shutdown, the NM should look through
> its directories for existing container.pids and try and kill an existing
> containers matching the pids found.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira