[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473483#comment-13473483
 ] 

Sandy Ryza commented on YARN-72:
--------------------------------

Code already exists to kill existing containers when the resource manager 
requests it.  Three events are dispatched to make this happen: a 
COMPLETED_CONTAINERS event is handled by the ContainerManager, which dispatches 
a KILL_CONTAINER event for each container to be killed, which the 
ContainerImpls handle by dispatching CLEANUP_CONTAINER events, which are 
finally handled by the ContainersLauncher, which tries to kills the containers.

Does it make more sense to use this chain events or to try to call the kill 
code directly?  For the former, the issue would be how do we know when the 
cleanup has been completed?  It looks like ContainerImpls have their state 
changed when their containers are killed, so the shutdown code could monitor 
them until they all reach the correct state, but a fair bit of plumbing would 
be required for the shutdown code to be able to get to them.  For the latter, 
similar plumbing would be required for the shutdown code to reach the 
ContainerImpls, and the other issue would be circumventing the event system, 
which might have consequences that I'm not able to foresee?

This is my first foray into nodemanager code, so maybe someone who understands 
it better can provide some perspective?
                
> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to