[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580234#comment-16580234
 ] 

Jim Brennan commented on YARN-8648:
-----------------------------------

{quote}I am wondering if this approach would break the docker launching docker 
use case. Currently if you're launching a new docker container from an existing 
docker container, you can have the new container use the same cgroup as the 
first container (e.g. /hadoop-yarn/${CONTAINER_ID}), but if there weren't a 
unique cgroup parent for the container you wouldn't be able to do that. Unless 
there's a way to find out the docker container id from inside the container?
{quote}
 

Thanks [~billie.rinaldi]. Yes I think this use-case would break as you suggest.
{quote}One potential issue with a useResourceHandlers() approach is if the NM 
wants to manipulate cgroup settings on a live container. Having a runtime that 
says it doesn't use resource handlers implies that can't be done by that 
runtime, but it can be supported by the docker runtime
{quote}
Agreed [~jlowe].  I no longer think useResourceHandlers() is a good approach

I don't have a full solution in mind yet, but one question is whether we should 
continue using the per-container cgroup as the cgroup parent for docker.  The 
main advantage to maintaining it is that there is a lot of code that already 
depends on it.  All existing resource handlers just work with this setup.  The 
disadvantage is that it makes fixing the leak harder because docker is creating 
hierarchies under the unused resource types (cpuset, hugetlb, etc...) and it 
creates them as root making it harder for the NM to remove them.

If we use the top-level (hadoop-yarn) as the cgroup parent, then docker cleans 
everything up pretty nicely (although it still leaks the top-level hadoop-yarn 
cgroup for the unused-by-NM resources.  But it breaks the case 
[~billie.rinaldi] mentioned above, and requires that we convert all existing 
resource handlers to use docker command options in the docker case.

One thought I had is adding a dockerCleaupResourceHandler that we tack on to 
the end of the resourceHandlerChain - it's only job would be to clean up the 
extra container cgroups that docker creates.

> Container cgroups are leaked when using docker
> ----------------------------------------------
>
>                 Key: YARN-8648
>                 URL: https://issues.apache.org/jira/browse/YARN-8648
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>              Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.    So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to