[ https://issues.apache.org/jira/browse/YARN-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754975#comment-15754975 ]
Nathan Roberts commented on YARN-2904: -------------------------------------- Simple streaming job that does the following illustrates tasks escaping. (/usr/bin/timeout does a setpgrp() which puts it in its own session). {noformat} #!/bin/bash /usr/bin/timeout 1d /bin/sleep 1000 {noformat} Mesos has apparently addressed this a couple of different ways including 1) freeze_container->kill_all_processes_in_container->unfreeze_container; or 2) use a private PID NS within the container and then kill PID1 within the container. > Use linux cgroups to enhance container tear down > ------------------------------------------------ > > Key: YARN-2904 > URL: https://issues.apache.org/jira/browse/YARN-2904 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Nathan Roberts > > If we are launching yarn containers within cgroups, linux provides some > guarantees that can help completely tear down a container. Specifically, > linux guarantees that tasks can't escape a cgroup. We can use this fact to > tear down a yarn container without leaking tasks. > Today, a SIGTERM is sent to the session (normally lead by bash). When the > session leader exits, the LCE sees this and assumes all resources have been > given back to the system. This is not guaranteed. Example: YARN-2809 > implements a workaround that is only necessary because tasks are still > lingering within the cgroup when the nodemanager attempts to delete it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org