[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Affects Version/s: (was: 3.1.2) (was: 2.8.5) (was: 2.9.2) (was: 3.2.0) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839054#comment-16839054 ] Shurong Mai edited comment on YARN-9518 at 5/14/19 3:54 AM: [~Jim_Brennan], thank you very much. You are right. You said " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" after release 2.8, so the problem of cgroup path with comma in this issue is not a problem after release 2.8, and I removed "2.8.5, 2.9.2, 3.1.2, 3.2.0" from affects versions and remain "2.7.7". We are running in 2.7.7 release. I said to [~jhung] "YARN-2194 looks the same problem as this issue, but it supplies another different solution." . Therefore, my patch also supplies a solution in version 2.7.7 and older version. Thank you a lot again. was (Author: shurong.mai): [~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839054#comment-16839054 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832991#comment-16832991 ] Shurong Mai edited comment on YARN-9517 at 5/5/19 8:14 AM: --- Hi, [~wangda] , I just thought the problem was resolved by the patch, so I closed this Jira issue. It is not fixed in these branches. I have reopened this issue. was (Author: shurong.mai): Hi, [~wangda] , I just thought the problem was resolved by the patch, so I closed this Jira issue. I haven't commit the patch to these branches. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 2.8.5, 2.7.7 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517-branch-2.8.5.001.patch, YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai reopened YARN-9517: --- > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833248#comment-16833248 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have submitted the patch for branch-2.7.7(the same as 2.7.x, 2.8.x)and the patch for trunck (the same as 2.9.x,3.1.x,3.2.x ) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-trunk.001.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833199#comment-16833199 ] Shurong Mai edited comment on YARN-9518 at 5/5/19 7:00 AM: --- [~Jim_Brennan], thank you for your attention and guidance. I have looked at the source code of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the same problem. But the patch can only apply to 2.7.x and 2.8.x, because 2.9.x,3.1.x,3.2.x(the same as trunk) have a little difference in the source code context of patch. So, I need to make another patch for 2.9.x,3.1.x,3.2.x was (Author: shurong.mai): [~Jim_Brennan], thank you for your attention and guidance. I have looked at the source code of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the same problem. But the patch can only apply to 2.7.x and 2.8.x, because 2.9.x,3.1.x,3.2.x have a little difference in the source code context of patch. So, I need to make another patch for 2.9.x,3.1.x,3.2.x > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-branch-2.7.7.001.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: (was: YARN-9518-branch-2.7.7.patch) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-branch-2.7.7.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833208#comment-16833208 ] Shurong Mai commented on YARN-9518: --- [~jhung], YARN-2194 looks the same problem as this issue, but it supplies another different solution. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. {code:java} cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) {code} When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated by container-executor as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is compatible with cgroup path of history os version such as centos6, centos7 , and universally applicable to cgroup subsystem paths such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833199#comment-16833199 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], thank you for your attention and guidance. I have looked at the source code of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the same problem. But the patch can only apply to 2.7.x and 2.8.x, because 2.9.x,3.1.x,3.2.x have a little difference in the source code context of patch. So, I need to make another patch for 2.9.x,3.1.x,3.2.x > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at >
[jira] [Commented] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832991#comment-16832991 ] Shurong Mai commented on YARN-9517: --- Hi, [~wangda] , I just thought the problem was resolved by the patch, so I closed this Jira issue. I haven't commit the patch to these branches. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. {code:java} cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) {code} When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated by container-executor as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830140#comment-16830140 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have read YARN-5301 and the patch, I don't think it is the same problem. YARN-5301 is about -mount-cgroups fail if enable auto mount cgroup, while this issue is about resource description arguments of container-executor which cause the cgroup path truncated because of comma in path "/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks". Therefore, this issue is another problem which is different from YARN-5301. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated by container-executor as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830103#comment-16830103 ] Shurong Mai edited comment on YARN-9518 at 4/30/19 9:44 AM: [~Jim_Brennan], I have read the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". was (Author: shurong.mai): [~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at >
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830103#comment-16830103 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: (was: YARN-9518.patch) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2019-04-19
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830057#comment-16830057 ] Shurong Mai commented on YARN-9518: --- Hi, [~Jim_Brennan] , does "latest code (trunk)" mean the latest version, for example hadoop-2.9.2, hadoop-3.2.0 ? > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: can't use CGroups with YARN in centos7 (was: can not use CGroups with YARN in centos7 ) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO >
[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829233#comment-16829233 ] Shurong Mai commented on YARN-9518: --- hi, [~adam.antal], I have completed the description of this issue and submit a patch, please review. > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) >
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys.fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct" {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main : command provided 1 2019-04-19 20:17:20,109 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: can not use CGroups with YARN in centos7 (was: cgroup subsystem in centos7 ) > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829153#comment-16829153 ] Shurong Mai commented on YARN-9518: --- hi [~adam.antal], thank you for your attention. I am editting this issue, please wait for moments. > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Affects Version/s: 3.2.0 2.9.2 2.8.5 2.7.7 3.1.2 > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) cgroup subsystem in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: cgroup subsystem in centos7 (was: cgroup in centos7) > cgroup subsystem in centos7 > > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) cgroup in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Priority: Major (was: Critical) > cgroup in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9518) cgroup in centos7
Shurong Mai created YARN-9518: - Summary: cgroup in centos7 Key: YARN-9518 URL: https://issues.apache.org/jira/browse/YARN-9518 Project: Hadoop YARN Issue Type: Bug Reporter: Shurong Mai -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: yarn-site.xml {code:java} yarn.log-aggregation-enable false {code} When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai reopened YARN-9517: --- > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Labels: patch (was: ) > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai resolved YARN-9517. --- Resolution: Fixed > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982 ] Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM: We had applied the patch to our hadoop and test ok was (Author: shurong.mai): We have applied the patch to our hadoop and test ok > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982 ] Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM: We had applied the patch to our hadoop and test ok. was (Author: shurong.mai): We had applied the patch to our hadoop and test ok > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which can apply to this hadoop version. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Affects Version/s: 2.2.0 2.3.0 2.4.1 2.5.2 2.6.5 3.2.0 2.9.2 2.8.5 > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x and 3.x. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; was:Aggregation is not enabled, when we click > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x and 3.x. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x and 3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Summary: When aggregation is not enabled, can't see the container log (was: When Aggregation is not enabled, can't see the container log) > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > Aggregation is not enabled, when we click -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: Aggregation is not enabled, when we click > When Aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > Aggregation is not enabled, when we click -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Summary: When Aggregation is not enabled, can't see the container log (was: when Aggregation is not enabled, can't see the container log) > When Aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9517) when Aggregation is not enabled, can't see the container log
Shurong Mai created YARN-9517: - Summary: when Aggregation is not enabled, can't see the container log Key: YARN-9517 URL: https://issues.apache.org/jira/browse/YARN-9517 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.7 Reporter: Shurong Mai -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893 ] Shurong Mai edited comment on YARN-5449 at 4/29/19 3:37 AM: [~rohithsharma] , thank you for your attention and advice . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we could not get the reason for sure. As a result of we analysed, we guessed the most probable reason of nodemanager process hung was that disk hanging when reading/writing disk, but we have not proved that yet. was (Author: shurong.mai): [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: Shurong Mai >Priority: Major > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893 ] Shurong Mai edited comment on YARN-5449 at 4/29/19 2:59 AM: [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. was (Author: shurong.mai): [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: Shurong Mai >Priority: Major > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893 ] Shurong Mai commented on YARN-5449: --- [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: Shurong Mai >Priority: Major > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org