[ 
https://issues.apache.org/jira/browse/YARN-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754975#comment-15754975
 ] 

Nathan Roberts commented on YARN-2904:
--------------------------------------

Simple streaming job that does the following illustrates tasks escaping. 
(/usr/bin/timeout does a setpgrp() which puts it in its own session). 
{noformat}
#!/bin/bash
/usr/bin/timeout 1d /bin/sleep 1000
{noformat}

Mesos has apparently addressed this a couple of different ways including 1) 
freeze_container->kill_all_processes_in_container->unfreeze_container; or 2) 
use a private PID NS within the container and then kill PID1 within the 
container. 

> Use linux cgroups to enhance container tear down
> ------------------------------------------------
>
>                 Key: YARN-2904
>                 URL: https://issues.apache.org/jira/browse/YARN-2904
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>
> If we are launching yarn containers within cgroups, linux provides some 
> guarantees that can help completely tear down a container.  Specifically, 
> linux guarantees that tasks can't escape a cgroup. We can use this fact to 
> tear down a yarn container without leaking tasks.
> Today, a SIGTERM is sent to the session (normally lead by bash). When the 
> session leader exits, the LCE sees this and assumes all resources have been 
> given back to the system. This is not guaranteed. Example: YARN-2809 
> implements a workaround that is only necessary because tasks are still 
> lingering within the cgroup when the nodemanager attempts to delete it.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to