[jira] [Updated] (YARN-11572) hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload" command

Jean-Baptiste Guet (Jira) Mon, 18 Sep 2023 02:18:05 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jean-Baptiste Guet updated YARN-11572:
--------------------------------------
    Description: 
I have an Hadoop cluster and I need to activate cgroups in order to use GPU in 
docker environment. I followed the documentation for the setup.

 

{*}To summarize{*}: I do manage myself the cgroups creation (cpu, cpuacct and 
devices), which results as expected on the creation of 3 directories in 
{_}{{/sys/fs/cgroup/}}{_}. However, upon each {_}systemctl daemon-reload{_}, 
the _/sys/fs/cgroup-hadoop-yarn_ directory is systematically deleted, which 
prevents Yarn's nodemanager from working.

 

{*}In details{*}:

As it's written in the documentation, I kept the parameter 
_yarn.nodemanager.linux-container-executor.cgroups.mount_ to _false_ in order 
manage the cgroup myself (security reason).

As I'm on CentOS 8, I use cgroup v1. I defined the parameters :
 * _yarn.nodemanager.linux-container-executor.cgroups.hierarchy_ to 
_/hadoop-yarn_
 * _yarn.nodemanager.linux-container-executor.cgroups.mount-path_ to 
_/sys/fs/cgroup_

Yarn needs 3 cgroups : cpu, cpuacct and devices.

In order to have the /haddop-yarn persistent, I've install libcgroup rpm then 
I've updated /etc/cgconfig.conf with
{code:java}
group hadoop-yarn {
     perm {
         admin {
             uid = yarn;
             gid = hadoop;
         }
         task {
             uid = yarn;
             gid = hadoop;
         }
     }
     cpu {}
     cpuacct {}
     devices {}
 }
{code}
and I've started cgconfig service. The 3 directories are created :
{code:java}
$ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ }}
{code}
 

At this point, I can restart the Yarn NodeManager.

However, each time that someone execute {{{}systemctl daemon-reload{}}}, the 
devices directory is deleted :
{code:java}
$ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
{{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or 
directory }}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ }}
drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/{code}
 

I see nothing in logs, I have no idea why this directory is deleted. And of 
course, Yarn NodeManager needs this directory, so the NodeManager doesn't work 
anymore and needs to be restarted (once the directory has been re-created of 
course).

As an other solution of cgconfig service, I've tested to create my own service 
that will create these directories.
{code:java}
vim /etc/systemd/system/hadoop-yarn-cgroup.service

[Unit]
Description=Custom cgroup for Hadoop YARN

[Service]
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
ExecStart=/bin/true
Slice=hadoop-yarn.slice
MemoryAccounting=yes
MemoryLimit=1G

[Install]
WantedBy=multi-user.target{code}
 

The behaviour is the same :
 * directories are created
 * systemctl daemon-reload
 * devices/hadoop-yarn directory is deleted

  was:
I have an Hadoop cluster and I need to activate cgroups in order to use GPU in 
docker environment. I followed the documentation for the setup.

 

{*}To summarize{*}: I do manage myself the cgroups creation (cpu, cpuacct and 
devices), which results as expected on the creation of 3 directories in 
{_}{{/sys/fs/cgroup/}}{_}. However, upon each {_}systemctl daemon-reload{_}, 
the _/sys/fs/cgroup-hadoop-yarn_ directory is systematically deleted, which 
prevents Yarn's nodemanager from working.

 

{*}In details{*}:

As it's written in the documentation, I kept the parameter 
_yarn.nodemanager.linux-container-executor.cgroups.mount_ to _false_ in order 
manage the cgroup myself (security reason).

As I'm on CentOS 8, I use cgroup v1. I defined the parameters :
 * _yarn.nodemanager.linux-container-executor.cgroups.hierarchy_ to 
_/hadoop-yarn_ 
 * _yarn.nodemanager.linux-container-executor.cgroups.mount-path_ to 
_/sys/fs/cgroup_

Yarn needs 3 cgroups : cpu, cpuacct and devices.

In order to have the /haddop-yarn persistent, I've install libcgroup rpm then 
I've updated /etc/cgconfig.conf with
{code:java}
group hadoop-yarn {
     perm {
         admin {
             uid = yarn;
             gid = hadoop;
         }
         task {
             uid = yarn;
             gid = hadoop;
         }
     }
     cpu {}
     cpuacct {}
     devices {}
 }
{code}
{{}}

and I've started cgconfig service. The 3 directories are created :

{{}}
{code:java}

{code}
{{$ ll /sys/fs/cgroup/\{cpu,cpuacct,devices}/hadoop-yarn/ -d}}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/}}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ }}

 

At this point, I can restart the Yarn NodeManager.

However, each time that someone execute {{{}systemctl daemon-reload{}}}, the 
devices directory is deleted :

{{}}
{code:java}

{code}
{{$ ll /sys/fs/cgroup/\{cpu,cpuacct,devices}/hadoop-yarn/ -d}}
{{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or 
directory }}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ }}
{{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/}}

{{ }}

 

I see nothing in logs, I have no idea why this directory is deleted. And of 
course, Yarn NodeManager needs this directory, so the NodeManager doesn't work 
anymore and needs to be restarted (once the directory has been re-created of 
course).

As an other solution of cgconfig service, I've tested to create my own service 
that will create these directories.

{{}}
{code:java}

{code}
{{vim /etc/systemd/system/hadoop-yarn-cgroup.service}}

{{[Unit]}}
{{Description=Custom cgroup for Hadoop YARN}}

{{[Service]}}
{{ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn}}
{{ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn}}
{{ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn}}
{{ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/}}
{{ExecStartPre=/usr/bin/chown -R yarn:hadoop 
/sys/fs/cgroup/cpuacct/hadoop-yarn/}}
{{ExecStartPre=/usr/bin/chown -R yarn:hadoop 
/sys/fs/cgroup/devices/hadoop-yarn/}}
{{ExecStart=/bin/true}}
{{Slice=hadoop-yarn.slice}}
{{MemoryAccounting=yes}}
{{MemoryLimit=1G}}

{{[Install]}}
{{WantedBy=multi-user.target }}

 

The behaviour is the same :
 * directories are created
 * systemctl daemon-reload
 * devices/hadoop-yarn directory is deleted


> hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload" 
> command
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-11572
>                 URL: https://issues.apache.org/jira/browse/YARN-11572
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.3.4
>         Environment:  
>  
>            Reporter: Jean-Baptiste Guet
>            Priority: Major
>
> I have an Hadoop cluster and I need to activate cgroups in order to use GPU 
> in docker environment. I followed the documentation for the setup.
>  
> {*}To summarize{*}: I do manage myself the cgroups creation (cpu, cpuacct and 
> devices), which results as expected on the creation of 3 directories in 
> {_}{{/sys/fs/cgroup/}}{_}. However, upon each {_}systemctl daemon-reload{_}, 
> the _/sys/fs/cgroup-hadoop-yarn_ directory is systematically deleted, which 
> prevents Yarn's nodemanager from working.
>  
> {*}In details{*}:
> As it's written in the documentation, I kept the parameter 
> _yarn.nodemanager.linux-container-executor.cgroups.mount_ to _false_ in order 
> manage the cgroup myself (security reason).
> As I'm on CentOS 8, I use cgroup v1. I defined the parameters :
>  * _yarn.nodemanager.linux-container-executor.cgroups.hierarchy_ to 
> _/hadoop-yarn_
>  * _yarn.nodemanager.linux-container-executor.cgroups.mount-path_ to 
> _/sys/fs/cgroup_
> Yarn needs 3 cgroups : cpu, cpuacct and devices.
> In order to have the /haddop-yarn persistent, I've install libcgroup rpm then 
> I've updated /etc/cgconfig.conf with
> {code:java}
> group hadoop-yarn {
>      perm {
>          admin {
>              uid = yarn;
>              gid = hadoop;
>          }
>          task {
>              uid = yarn;
>              gid = hadoop;
>          }
>      }
>      cpu {}
>      cpuacct {}
>      devices {}
>  }
> {code}
> and I've started cgconfig service. The 3 directories are created :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ 
> }}
> {code}
>  
> At this point, I can restart the Yarn NodeManager.
> However, each time that someone execute {{{}systemctl daemon-reload{}}}, the 
> devices directory is deleted :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> {{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or 
> directory }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ 
> }}
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/{code}
>  
> I see nothing in logs, I have no idea why this directory is deleted. And of 
> course, Yarn NodeManager needs this directory, so the NodeManager doesn't 
> work anymore and needs to be restarted (once the directory has been 
> re-created of course).
> As an other solution of cgconfig service, I've tested to create my own 
> service that will create these directories.
> {code:java}
> vim /etc/systemd/system/hadoop-yarn-cgroup.service
> [Unit]
> Description=Custom cgroup for Hadoop YARN
> [Service]
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
> ExecStart=/bin/true
> Slice=hadoop-yarn.slice
> MemoryAccounting=yes
> MemoryLimit=1G
> [Install]
> WantedBy=multi-user.target{code}
>  
> The behaviour is the same :
>  * directories are created
>  * systemctl daemon-reload
>  * devices/hadoop-yarn directory is deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-11572) hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload" command

Reply via email to