[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486349#comment-13486349
 ] 

Sandy Ryza commented on YARN-72:
--------------------------------

You're right, we should have some sort of timeout, and just move on and exit 
after that.

My worries about the process group approach would be:
* A process to be killed via process group is sent a SIGHUP signal, which it 
can choose to catch and ignore.  The current NodeManager mechanism that my 
patch makes use of ultimately sends a SIGKILL, which cannot be ignored.
* Processes are allowed to change their own process group.
* The proposed solution to YARN-3 also relies on a possibly conflicting use 
process groups (I believe a single one for each container?).
* From cursory Googling, there doesn't seem to be any nice way in Java to deal 
with them.

That said, I'd also defer to someone with a better understanding of NM.
                
> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Sandy Ryza
>         Attachments: YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to