[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839709#comment-16839709
 ] 

Hadoop QA commented on YARN-9518:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
37s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
52s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 21s{color} | {color:orange} root: The patch generated 3 new + 336 unchanged 
- 0 fixed = 339 total (was 336) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 65 line(s) that end in whitespace. Use 
git apply --whitespace=fix <<patch_file>>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 
55s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 34s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-yarn-server-nodemanager:1 |
| Failed junit tests | hadoop.yarn.server.nodemanager.TestNodeManagerResync |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
|   | hadoop.yarn.server.nodemanager.webapp.TestNMWebServer |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:06eafee |
| JIRA Issue | YARN-9518 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968693/YARN-9518-branch-2.7.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux a7fe3c4bd0a5 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.7 / cec0041 |
| maven | version: Apache Maven 3.0.5 |
| Default Java | 1.7.0_201 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/artifact/out/diff-checkstyle-root.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/artifact/out/whitespace-eol.txt
 |
| Unreaped Processes Log | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-reaper.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/testReport/ |
| Max. process+thread count | 239 (vs. ulimit of 10000) |
| modules | C: hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24090/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> can't use CGroups with YARN in centos7 
> ---------------------------------------
>
>                 Key: YARN-9518
>                 URL: https://issues.apache.org/jira/browse/YARN-9518
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.7
>            Reporter: Shurong Mai
>            Priority: Major
>              Labels: cgroup, patch
>         Attachments: YARN-9518-branch-2.7.002.patch, 
> YARN-9518-branch-2.7.7.001.patch, YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_000001/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_000001/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_000001/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##################################################################################################################################
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_000001 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_000001 and exit code: 27
>  ExitCodeException exitCode=27:
>          at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>          at org.apache.hadoop.util.Shell.run(Shell.java:482)
>          at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>          at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>          at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>          at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>          at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>          at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1554210318404_0042_01_000001
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
> ExitCodeException exitCode=27:
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.util.Shell.run(Shell.java:482)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at 
> java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: 
> main : command provided 1
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> test_hadoop
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested 
> yarn user is datadev
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to 
> cgroup task files...
>  2019-04-19 20:17:20,109 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open file 
> /sys/fs/cgroup/cpu as node manager - Is a directory
>  2019-04-19 20:17:20,131 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container exited with a non-zero exit code 27
>  2019-04-19 20:17:20,133 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_000001 transitioned from RUNNING 
> to EXITED_WITH_FAILURE
>  2019-04-19 20:17:20,133 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1554210318404_0042_01_000001
>   
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to