[
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580067#comment-16580067
]
Jim Brennan commented on YARN-8648:
-----------------------------------
[~jlowe] thanks for the comment.
{quote}We should consider breaking this up into two JIRAs if it proves
difficult to hash through the design. It's a relatively small change to move
the docker containers under the top-level YARN cgroup hierarchy to fixes the
cgroup leaks, with the side-effect that the NM continues to create and cleanup
unused cgroups per docker container launched. We could follow up that change
with another JIRA to resolve the new design for the cgroup / container runtime
interaction so those empty cgroups are avoided in the docker case. If we can
hash it out quickly in one JIRA that's great, but I want to make sure the leak
problem doesn't linger while we work through the architecture of cgroups and
container runtimes.
{quote}
The main issue with doing this quick fix for the cgroups leak is that any
cgroup parameters written by the various resource handlers will be ignored in
the docker case because they will be written to the unused container cgroup.
Internally, we added a cpu-shares option to docker to handle the cpu resource
because that is the only one we're using, but for community I think we need to
address them all.
Is it worth breaking cgroups parameters temporarily for docker to fix the leak?
> Container cgroups are leaked when using docker
> ----------------------------------------------
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Jim Brennan
> Assignee: Jim Brennan
> Priority: Major
> Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups
> for all resources on the system, not just for cpu. For instance, if the
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}},
> the nodemanager will create a cgroup for each container under
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path
> via the {{--cgroup-parent}} command line argument. Docker then creates a
> cgroup for the docker container under that, for instance:
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}}
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every
> resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio,
> perf_event, and systemd. So for instance, docker creates
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but
> it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up
> the {{container_id}} cgroups for these other resources. On one of our busy
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to
> repro with current hadoop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]