[
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473483#comment-13473483
]
Sandy Ryza commented on YARN-72:
--------------------------------
Code already exists to kill existing containers when the resource manager
requests it. Three events are dispatched to make this happen: a
COMPLETED_CONTAINERS event is handled by the ContainerManager, which dispatches
a KILL_CONTAINER event for each container to be killed, which the
ContainerImpls handle by dispatching CLEANUP_CONTAINER events, which are
finally handled by the ContainersLauncher, which tries to kills the containers.
Does it make more sense to use this chain events or to try to call the kill
code directly? For the former, the issue would be how do we know when the
cleanup has been completed? It looks like ContainerImpls have their state
changed when their containers are killed, so the shutdown code could monitor
them until they all reach the correct state, but a fair bit of plumbing would
be required for the shutdown code to be able to get to them. For the latter,
similar plumbing would be required for the shutdown code to reach the
ContainerImpls, and the other issue would be circumventing the event system,
which might have consequences that I'm not able to foresee?
This is my first foray into nodemanager code, so maybe someone who understands
it better can provide some perspective?
> NM should handle cleaning up containers when it shuts down ( and kill
> containers from an earlier instance when it comes back up after an unclean
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-72
> URL: https://issues.apache.org/jira/browse/YARN-72
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Hitesh Shah
>
> Ideally, the NM should wait for a limited amount of time when it gets a
> shutdown signal for existing containers to complete and kill the containers (
> if we pick an aggressive approach ) after this time interval.
> For NMs which come up after an unclean shutdown, the NM should look through
> its directories for existing container.pids and try and kill an existing
> containers matching the pids found.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira