JayceAu created YARN-8031:
-----------------------------
Summary: NodeManager will fail to start if cpu subsystem is
already mounted
Key: YARN-8031
URL: https://issues.apache.org/jira/browse/YARN-8031
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.5.0
Reporter: JayceAu
Attachments: image-2018-03-15-14-47-30-583.png
if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true and
cpu subsystem is not yet mounted, NodeManager will mount the cpu subsystem and
then create the control group whose default name is *hadoop-yarn* if the mount
step is successful. This procedure works well if cpu subsystem is not yet
mounted. However, under some situation cpu subsystem is already mounted before
NodeManager starts and NodeManager will fail to start because of no write
permission to the *hadoop-yarn* path . For example:
# in OS that use systemd such as centos7 will have cpu subsystem mounted by
default on machine startup
# some deamon whose start order is more precedent than NodeManager may also
rely on the mounted state of cpu subsystem. In our production environment, we
limit the cpu usage of the monitoring and control agent, which starts on reboot
In order to solve this problem, container-executor must be able to create the
control group *hadoop-yarn* if mounting controller is successful or this
controller is already mounted. Besides, if cpu subsystem is used in combination
with other subsystem and it's already mounted, container-executor should use
the latest mount point of cpu subsystem instread of the one provided by
NodeManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]