[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-13 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Affects Version/s: (was: 3.1.2)
   (was: 2.8.5)
   (was: 2.9.2)
   (was: 3.2.0)

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  

[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-13 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839054#comment-16839054
 ] 

Shurong Mai edited comment on YARN-9518 at 5/14/19 3:54 AM:


[~Jim_Brennan], thank you very much. 

You are right. You said " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is 
now used as a separator instead of comma" after release 2.8, so the problem of 
cgroup path with comma in this issue is not a problem  after release 2.8, and I 
removed "2.8.5, 2.9.2, 3.1.2, 3.2.0" from affects versions and remain "2.7.7". 

We are running in 2.7.7 release. I said to [~jhung]  "YARN-2194 looks the same 
problem as this issue, but it supplies another  different solution." . 
Therefore, my patch also supplies a solution in version 2.7.7 and older version.

Thank you a lot again.


was (Author: shurong.mai):
[~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which 
is {{%}}) is now used as a separator instead of comma"

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-13 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839054#comment-16839054
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which 
is {{%}}) is now used as a separator instead of comma"

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at 

[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-05-05 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832991#comment-16832991
 ] 

Shurong Mai edited comment on YARN-9517 at 5/5/19 8:14 AM:
---

Hi, [~wangda] , I just thought the problem was resolved by the patch, so I 
closed this Jira issue. It is not fixed in these branches. 

I have reopened this issue.


was (Author: shurong.mai):
Hi, [~wangda] , I just thought the problem was resolved by the patch, so I 
closed this Jira issue. I haven't commit the patch to  these branches.

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 2.8.5, 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517-branch-2.8.5.001.patch, YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-05-05 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai reopened YARN-9517:
---

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833248#comment-16833248
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan], I have submitted the patch for branch-2.7.7(the same as 2.7.x, 
2.8.x)and the patch for trunck (the same as 2.9.x,3.1.x,3.2.x )

 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: YARN-9518-trunk.001.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    

[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833199#comment-16833199
 ] 

Shurong Mai edited comment on YARN-9518 at 5/5/19 7:00 AM:
---

[~Jim_Brennan], thank you for your attention and guidance. I have looked at the 
source code  of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the 
same problem. But the patch can only apply to 2.7.x and 2.8.x, because 
2.9.x,3.1.x,3.2.x(the same as trunk) have a little difference in the source 
code context of patch. So, I need to make another patch for  2.9.x,3.1.x,3.2.x 


was (Author: shurong.mai):
[~Jim_Brennan], thank you for your attention and guidance. I have looked at the 
source code  of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the 
same problem. But the patch can only apply to 2.7.x and 2.8.x, because 
2.9.x,3.1.x,3.2.x have a little difference in the source code context of patch. 
So, I need to make another patch for  2.9.x,3.1.x,3.2.x 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: YARN-9518-branch-2.7.7.001.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: (was: YARN-9518-branch-2.7.7.patch)

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-05 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: YARN-9518-branch-2.7.7.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-04 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833208#comment-16833208
 ] 

Shurong Mai commented on YARN-9518:
---

[~jhung], YARN-2194 looks the same problem as this issue, but it supplies 
another  different solution.

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-04 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7. 
{code:java}
cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
{code}
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated by container-executor as 
"/sys/fs/cgroup/cpu" rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is compatible with  cgroup path of history os version such as 
centos6, centos7 , and universally applicable to cgroup subsystem paths such as 
cgroup network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-04 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833199#comment-16833199
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan], thank you for your attention and guidance. I have looked at the 
source code  of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the 
same problem. But the patch can only apply to 2.7.x and 2.8.x, because 
2.9.x,3.1.x,3.2.x have a little difference in the source code context of patch. 
So, I need to make another patch for  2.9.x,3.1.x,3.2.x 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> 

[jira] [Commented] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-05-03 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832991#comment-16832991
 ] 

Shurong Mai commented on YARN-9517:
---

Hi, [~wangda] , I just thought the problem was resolved by the patch, so I 
closed this Jira issue. I haven't commit the patch to  these branches.

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7. 
{code:java}
cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
{code}
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated by container-executor as 
"/sys/fs/cgroup/cpu" rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830140#comment-16830140
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan],  I have read YARN-5301 and the patch, I don't think it is the 
same problem. 

YARN-5301 is about -mount-cgroups fail if enable auto mount cgroup, while this 
issue is about  resource description arguments of container-executor  which 
cause the cgroup path truncated because of comma in path 
"/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks".

Therefore, this issue is another problem which is different from YARN-5301.

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated by container-executor as 
"/sys/fs/cgroup/cpu" rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 

[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830103#comment-16830103
 ] 

Shurong Mai edited comment on YARN-9518 at 4/30/19 9:44 AM:


[~Jim_Brennan], I have read the source code about these in version 2.7.7, 
2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with 
comma as "/sys/fs/cgroup/cpu,cpuacct".


was (Author: shurong.mai):
[~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 
2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma 
as "/sys/fs/cgroup/cpu,cpuacct".

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830103#comment-16830103
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 
2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma 
as "/sys/fs/cgroup/cpu,cpuacct".

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: YARN-9518.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: (was: YARN-9518.patch)

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
>  2019-04-19 

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830057#comment-16830057
 ] 

Shurong Mai commented on YARN-9518:
---

Hi, [~Jim_Brennan] ,  does "latest code (trunk)" mean the latest version, for 
example hadoop-2.9.2, hadoop-3.2.0  ?

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: can't use CGroups with YARN in centos7   (was: can not use CGroups 
with YARN in centos7 )

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> 

[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829233#comment-16829233
 ] 

Shurong Mai commented on YARN-9518:
---

hi, [~adam.antal], I have completed the description of this issue and submit a 
patch, please review.

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows:

 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys.fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 "

 

 

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
 2019-04-19 20:17:20,109 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links.  

 

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct"

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
4_0042_01_01 and exit code: 27
ExitCodeException exitCode=27:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main 
: command provided 1
2019-04-19 20:17:20,109 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: can not use CGroups with YARN in centos7   (was: cgroup subsystem 
in centos7 )

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829153#comment-16829153
 ] 

Shurong Mai commented on YARN-9518:
---

hi [~adam.antal], thank you for your attention. I am editting this issue, 
please wait for moments.

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Affects Version/s: 3.2.0
   2.9.2
   2.8.5
   2.7.7
   3.1.2

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) cgroup subsystem in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: cgroup subsystem in centos7   (was: cgroup in centos7)

> cgroup subsystem in centos7 
> 
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) cgroup in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Priority: Major  (was: Critical)

> cgroup in centos7
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9518) cgroup in centos7

2019-04-29 Thread Shurong Mai (JIRA)
Shurong Mai created YARN-9518:
-

 Summary: cgroup in centos7
 Key: YARN-9518
 URL: https://issues.apache.org/jira/browse/YARN-9518
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shurong Mai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
yarn-site.xml
{code:java}

yarn.log-aggregation-enable
false

{code}
 

When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai reopened YARN-9517:
---

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Labels: patch  (was: )

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai resolved YARN-9517.
---
Resolution: Fixed

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982
 ] 

Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM:


We had applied the patch to our hadoop and test ok 


was (Author: shurong.mai):
We have applied the patch to our hadoop and test ok 

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982
 ] 

Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM:


We had applied the patch to our hadoop and test ok.


was (Author: shurong.mai):
We had applied the patch to our hadoop and test ok 

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which can apply to this hadoop version.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Affects Version/s: 2.2.0
   2.3.0
   2.4.1
   2.5.2
   2.6.5
   3.2.0
   2.9.2
   2.8.5

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x and 3.x.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

  was:Aggregation is not enabled, when we click 


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x and 3.x.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x and 3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Summary: When aggregation is not enabled, can't see the container log  
(was: When Aggregation is not enabled, can't see the container log)

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> Aggregation is not enabled, when we click 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: Aggregation is not enabled, when we click 

> When Aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> Aggregation is not enabled, when we click 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log

2019-04-28 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Summary: When Aggregation is not enabled, can't see the container log  
(was: when Aggregation is not enabled, can't see the container log)

> When Aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9517) when Aggregation is not enabled, can't see the container log

2019-04-28 Thread Shurong Mai (JIRA)
Shurong Mai created YARN-9517:
-

 Summary: when Aggregation is not enabled, can't see the container 
log
 Key: YARN-9517
 URL: https://issues.apache.org/jira/browse/YARN-9517
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.7
Reporter: Shurong Mai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2019-04-28 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893
 ] 

Shurong Mai edited comment on YARN-5449 at 4/29/19 3:37 AM:


[~rohithsharma] , thank you for your attention and advice . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we could not get 
the reason for sure. As a result of we analysed, we guessed the most  probable 
reason of nodemanager process  hung was that disk hanging  when reading/writing 
disk, but we have not proved that yet.


was (Author: shurong.mai):
[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: Shurong Mai
>Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2019-04-28 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893
 ] 

Shurong Mai edited comment on YARN-5449 at 4/29/19 2:59 AM:


[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.


was (Author: shurong.mai):
[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: Shurong Mai
>Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2019-04-28 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893
 ] 

Shurong Mai commented on YARN-5449:
---

[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: Shurong Mai
>Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org