[jira] [Commented] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

Miklos Szegedi (JIRA) Fri, 16 Mar 2018 09:41:25 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402162#comment-16402162
 ]


Miklos Szegedi commented on YARN-8031:
--------------------------------------

[~jayceAu], thank you for raising this. If you have CGroups already mounted, 
you should set the mount option to false as described here:

[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html]
{code:java}
Discover CGroups mounted already        This should be used on newer systems 
like RHEL7 or Ubuntu16 or if the administrator mounts CGroups before YARN 
starts. Set yarn.nodemanager.linux-container-executor.cgroups.mount to false 
and leave other settings set to their defaults. YARN will locate the mount 
points in /proc/mounts. Common locations include /sys/fs/cgroup and /cgroup. 
The default location can vary depending on the Linux distribution in use.{code}

> NodeManager will fail to start if cpu subsystem is already mounted
> ------------------------------------------------------------------
>
>                 Key: YARN-8031
>                 URL: https://issues.apache.org/jira/browse/YARN-8031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: JayceAu
>            Priority: Major
>         Attachments: image-2018-03-15-14-47-30-583.png
>
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

Reply via email to