[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580025#comment-16580025
 ] 

Jason Lowe commented on YARN-8648:
----------------------------------

+1 for the proposal to fix the cgroup leak by having docker place its cgroup at 
the same level YARN is creating its containers.  That preserves the semantic 
hierarchy of the hadoop-yarn cgroup in case administrators are explicitly 
setting controls on the entire YARN container hierarchy so it plays nice with 
other systems on the node.  It also preserves the cgroup sharing semantics 
between nodes performing a mix of container launches where some are docker and 
others are native.

Now the hard part is figuring out the cleanest way to have the NM create 
cgroups in the native case and avoid creating the cgroup in the docker case.  
This implies the container runtime needs some say in how cgroups are handled if 
it isn't delegated to the container runtime completely.  One potential issue 
with a useResourceHandlers() approach is if the NM wants to manipulate cgroup 
settings on a live container.  Having a runtime that says it doesn't use 
resource handlers implies that can't be done by that runtime, but it can be 
supported by the docker runtime.

We should consider breaking this up into two JIRAs if it proves difficult to 
hash through the design.  It's a relatively small change to move the docker 
containers under the top-level YARN cgroup hierarchy to fixes the cgroup leaks, 
with the side-effect that the NM continues to create and cleanup unused cgroups 
per docker container launched.  We could follow up that change with another 
JIRA to resolve the new design for the cgroup / container runtime interaction 
so those empty cgroups are avoided in the docker case.  If we can hash it out 
quickly in one JIRA that's great, but I want to make sure the leak problem 
doesn't linger while we work through the architecture of cgroups and container 
runtimes.

> Container cgroups are leaked when using docker
> ----------------------------------------------
>
>                 Key: YARN-8648
>                 URL: https://issues.apache.org/jira/browse/YARN-8648
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>              Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.    So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to