[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shurong Mai updated YARN-9518: ------------------------------ Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_000001 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_000001 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_000001 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main : command provided 1 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is test_hadoop 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested yarn user is datadev 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to cgroup task files... 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory 2019-04-19 20:17:20,131 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 27 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1554210318404_0042_01_000001 {panel} was: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct" {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_000001 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_000001 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_000001 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main : command provided 1 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is test_hadoop 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested yarn user is datadev 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to cgroup task files... 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory 2019-04-19 20:17:20,131 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 27 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1554210318404_0042_01_000001 {panel} > can not use CGroups with YARN in centos7 > ----------------------------------------- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 > Reporter: Shurong Mai > Priority: Major > Labels: cgroup, patch > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > > > {panel:title=exceptional nodemanager logs} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_000001 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_000001 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1554210318404_0042_01_000001 > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: > ExitCodeException exitCode=27: > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell.run(Shell.java:482) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: > main : command provided 1 > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is > test_hadoop > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested > yarn user is datadev > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to > cgroup task files... > 2019-04-19 20:17:20,109 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open file > /sys/fs/cgroup/cpu as node manager - Is a directory > 2019-04-19 20:17:20,131 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Container exited with a non-zero exit code 27 > 2019-04-19 20:17:20,133 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_000001 transitioned from RUNNING > to EXITED_WITH_FAILURE > 2019-04-19 20:17:20,133 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1554210318404_0042_01_000001 > > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org