[jira] [Commented] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000626#comment-17000626
 ] 

Sen Zhao commented on YARN-10048:
-

Hi, [~tangzhankun]. Right, if the cpu controller mounts multiple paths,  it 
will return the first path about parsedMtab.
eg:
{code:java}
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
none on /opt/cgroup/cpu type cgroup (rw,relatime,cpuacct,cpu)
{code}
Sometimes it will return */sys/fs/cgroup/cpu* instead of */opt/cgroup/cpu* 
related to the configuration

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch, YARN-10048.002.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.
> {code:java}
> 2019-12-19 14:46:08,200 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mounting controller cpu at /opt/cgroup/cpu
> 2019-12-19 14:46:08,290 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 32. Privileged Execution Op
> eration Stderr:
> Feature disabled: mount cgroup
> Stdout:
> Full command array for failed execution:
> [/opt/hadoop-yarn/bin/container-executor, --mount-cgroups, 
> yarn-NodeManager/hadoop-yarn, cpu,cpuacct=/opt/cgroup/cpu]
> 2019-12-19 14:46:08,290 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Failed to mount controller: cpu
> 2019-12-19 14:46:08,291 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to mount controller: cpu
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.mountCGroupController(CGroupsHandlerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:365)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:325)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:403)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-10048:

Description: 
After manually mounting the Cgroup, the NodeManager fails to start.

If the cpu controller has multiple mount path, only the first mount path will 
be returned. This will cause the return value to be not the actual cpu 
controller mount path.

{code:java}
2019-12-19 14:46:08,200 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
 Mounting controller cpu at /opt/cgroup/cpu
2019-12-19 14:46:08,290 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 32. Privileged Execution Op
eration Stderr:
Feature disabled: mount cgroup

Stdout:
Full command array for failed execution:
[/opt/hadoop-yarn/bin/container-executor, --mount-cgroups, 
yarn-NodeManager/hadoop-yarn, cpu,cpuacct=/opt/cgroup/cpu]
2019-12-19 14:46:08,290 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
 Failed to mount controller: cpu
2019-12-19 14:46:08,291 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
bootstrap configured resource subsystems!
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
 Failed to mount controller: cpu
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.mountCGroupController(CGroupsHandlerImpl.java:318)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:365)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:325)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:403)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)

{code}


  was:
After manually mounting the Cgroup, the NodeManager fails to start.

If the cpu controller has multiple mount path, only the first mount path will 
be returned. This will cause the return value to be not the actual cpu 
controller mount path.


> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch, YARN-10048.002.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.
> {code:java}
> 2019-12-19 14:46:08,200 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mounting controller cpu at /opt/cgroup/cpu
> 2019-12-19 14:46:08,290 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 32. Privileged Execution Op
> eration Stderr:
> Feature disabled: mount cgroup
> Stdout:
> Full command array for failed execution:
> [/opt/hadoop-yarn/bin/container-executor, --mount-cgroups, 
> yarn-NodeManager/hadoop-yarn, cpu,cpuacct=/opt/cgroup/cpu]
> 2019-12-19 14:46:08,290 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Failed to mount controller: cpu
> 2019-12-19 14:46:08,291 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to mount controller: cpu
> at 
> 

[jira] [Commented] (YARN-10041) Should not use AbstractPath to create unix domain socket

2019-12-19 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000616#comment-17000616
 ] 

Vinayakumar B commented on YARN-10041:
--

[~tangzhankun], patch has been provided as PR. please check 
[https://github.com/apache/hadoop/pull/1771] Jenkins result is already present.

> Should not use AbstractPath to create unix domain socket
> 
>
> Key: YARN-10041
> URL: https://issues.apache.org/jira/browse/YARN-10041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
> Environment: X86/ARM
> OS: ubuntu 1804
> java: java8
>Reporter: zhao bo
>Priority: Major
>
> This issue hits by a very coincidental scene. That 's happend when we test on 
> ARM.
> The test case is:
> org.apache.hadoop.yarn.csi.client.TestCsiClient.testIdentityService
>  
> The step is:
> If we make the hadoop source code dir to a very deep dir path, this case 
> would be pass at the first time running, but always fail in the following 
> tries.
> The official jenkins doesn't cover this, because it runs on Docker container 
> and just run test 1 time. So it looks like alway pass.
>  
> The  key point is the UNIX domain socket path exceed the limit of 
> UNIX_PATH_MAX(108). Please see [1]
>  
> This issue is very difficult to locate, as it will always return binding 
> failed when we exec the test.
>  
> Also I saw the hadoop code in trunk branch, the code use the AbsolutePath to 
> create the UNIX DOMAIN SOCKET file. The source code is [2]. So that can not 
> forbid to hit this issue. That's good to provide a second way to set the 
> socket path to '/tmp' or any place when exec this test.
> [1] 
> [https://serverfault.com/questions/641347/check-if-a-path-exceeds-maximum-for-unix-domain-socket]
> [2] 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/src/test/java/org/apache/hadoop/yarn/csi/client/TestCsiClient.java#L48]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10041) Should not use AbstractPath to create unix domain socket

2019-12-19 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000614#comment-17000614
 ] 

Vinayakumar B commented on YARN-10041:
--

thanks [~bzhaoopenstack] for patch.

I had checked the PR and given one comment. Please check.

> Should not use AbstractPath to create unix domain socket
> 
>
> Key: YARN-10041
> URL: https://issues.apache.org/jira/browse/YARN-10041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
> Environment: X86/ARM
> OS: ubuntu 1804
> java: java8
>Reporter: zhao bo
>Priority: Major
>
> This issue hits by a very coincidental scene. That 's happend when we test on 
> ARM.
> The test case is:
> org.apache.hadoop.yarn.csi.client.TestCsiClient.testIdentityService
>  
> The step is:
> If we make the hadoop source code dir to a very deep dir path, this case 
> would be pass at the first time running, but always fail in the following 
> tries.
> The official jenkins doesn't cover this, because it runs on Docker container 
> and just run test 1 time. So it looks like alway pass.
>  
> The  key point is the UNIX domain socket path exceed the limit of 
> UNIX_PATH_MAX(108). Please see [1]
>  
> This issue is very difficult to locate, as it will always return binding 
> failed when we exec the test.
>  
> Also I saw the hadoop code in trunk branch, the code use the AbsolutePath to 
> create the UNIX DOMAIN SOCKET file. The source code is [2]. So that can not 
> forbid to hit this issue. That's good to provide a second way to set the 
> socket path to '/tmp' or any place when exec this test.
> [1] 
> [https://serverfault.com/questions/641347/check-if-a-path-exceeds-maximum-for-unix-domain-socket]
> [2] 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/src/test/java/org/apache/hadoop/yarn/csi/client/TestCsiClient.java#L48]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10051) Throwing NoSuchElementException when even dispatcher handling NODE_UPDATE

2019-12-19 Thread Yong Xing (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Xing updated YARN-10051:
-
Description: 
 Restarting a NM, I found the active RM crash. The Exception stack is as 
follows.
{code:java}
2019-12-16 18:12:20,286 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
Error in handling event type NODE_UPDATE to the Event Dispatcher
java.util.NoSuchElementException
at 
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at 
java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1374)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:345)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:958)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
{code}
 

This issue looks a bit same as 
[YARN-9552|https://issues.apache.org/jira/browse/YARN-9552],[YARN-7382|https://issues.apache.org/jira/browse/YARN-7382].
 But the root cause is different.

  was:
 Restarting a NM, I found the active RM crash. The Exception stack is as 
follows.
{code:java}
2019-12-16 18:12:20,286 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
Error in handling event type NODE_UPDATE to the Event Dispatcher
java.util.NoSuchElementException
at 
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at 
java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1374)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:345)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:958)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
{code}
 


> Throwing NoSuchElementException when even dispatcher handling NODE_UPDATE
> -
>
> Key: YARN-10051
> URL: https://issues.apache.org/jira/browse/YARN-10051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Yong Xing
>Priority: Critical
> Fix For: 3.0.0
>
>
>  Restarting a NM, I found the active RM crash. The Exception stack is as 
> follows.
> {code:java}
> 2019-12-16 18:12:20,286 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
> Error in handling event type NODE_UPDATE to the Event Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> 

[jira] [Created] (YARN-10051) Throwing NoSuchElementException when even dispatcher handling NODE_UPDATE

2019-12-19 Thread Yong Xing (Jira)
Yong Xing created YARN-10051:


 Summary: Throwing NoSuchElementException when even dispatcher 
handling NODE_UPDATE
 Key: YARN-10051
 URL: https://issues.apache.org/jira/browse/YARN-10051
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Yong Xing
 Fix For: 3.0.0


 Restarting a NM, I found the active RM crash. The Exception stack is as 
follows.
{code:java}
2019-12-16 18:12:20,286 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
Error in handling event type NODE_UPDATE to the Event Dispatcher
java.util.NoSuchElementException
at 
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at 
java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1374)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:345)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:958)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000594#comment-17000594
 ] 

Zhankun Tang commented on YARN-10048:
-

[~Sen Zhao], thanks for catching this. Let me understand this, there's a 
mismatch between the founded controller path and the configured value when 
there's multiple path under cpu subsystem?

And could you please also show the error message when NM fails to crash? Thanks!

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch, YARN-10048.002.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000587#comment-17000587
 ] 

Hudson commented on YARN-10042:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17781 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17781/])
YARN-10042. Upgrade grpc-xxx depdencies to 1.26.0. Contributed by Sheng (ztang: 
rev 12722ab0c78b6a978f9355f9bdb2345b45d0d3be)
* (edit) LICENSE-binary
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/pom.xml


> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000583#comment-17000583
 ] 

Hadoop QA commented on YARN-10048:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
27s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | YARN-10048 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989225/YARN-10048.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1f6c67edfe6f 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ef59ffd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25307/testReport/ |
| Max. process+thread count | 328 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25307/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> NodeManager fails to start after mounting CGroup
> 

[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000578#comment-17000578
 ] 

Zhankun Tang commented on YARN-10042:
-

[~cheersyang], thanks for the review. Committed to trunk. Thanks [~seanlau] for 
the contribution!

> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-10042:

Fix Version/s: 3.3.0

> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000571#comment-17000571
 ] 

Hadoop QA commented on YARN-10050:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | YARN-10050 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989224/YARN-10050.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  xml  |
| uname | Linux 508495cf2d6d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ef59ffd |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 346 (vs. ulimit of 5500) |
| modules | C: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25306/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10041) Should not use AbstractPath to create unix domain socket

2019-12-19 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000569#comment-17000569
 ] 

Zhankun Tang commented on YARN-10041:
-

[~bzhaoopenstack], [~liusheng], could you please upload patch file like 
YARN-10042 and click "submit patch" button to trigger the CI.

> Should not use AbstractPath to create unix domain socket
> 
>
> Key: YARN-10041
> URL: https://issues.apache.org/jira/browse/YARN-10041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
> Environment: X86/ARM
> OS: ubuntu 1804
> java: java8
>Reporter: zhao bo
>Priority: Major
>
> This issue hits by a very coincidental scene. That 's happend when we test on 
> ARM.
> The test case is:
> org.apache.hadoop.yarn.csi.client.TestCsiClient.testIdentityService
>  
> The step is:
> If we make the hadoop source code dir to a very deep dir path, this case 
> would be pass at the first time running, but always fail in the following 
> tries.
> The official jenkins doesn't cover this, because it runs on Docker container 
> and just run test 1 time. So it looks like alway pass.
>  
> The  key point is the UNIX domain socket path exceed the limit of 
> UNIX_PATH_MAX(108). Please see [1]
>  
> This issue is very difficult to locate, as it will always return binding 
> failed when we exec the test.
>  
> Also I saw the hadoop code in trunk branch, the code use the AbsolutePath to 
> create the UNIX DOMAIN SOCKET file. The source code is [2]. So that can not 
> forbid to hit this issue. That's good to provide a second way to set the 
> socket path to '/tmp' or any place when exec this test.
> [1] 
> [https://serverfault.com/questions/641347/check-if-a-path-exceeds-maximum-for-unix-domain-socket]
> [2] 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/src/test/java/org/apache/hadoop/yarn/csi/client/TestCsiClient.java#L48]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10050:
-
Target Version/s: 3.3.0, 3.2.2  (was: 3.3.0)

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000552#comment-17000552
 ] 

Masatake Iwasaki edited comment on YARN-10050 at 12/20/19 2:00 AM:
---

The doc was added by YARN-4599 but it did not add link. I atttached 001 adding 
a link to site index and existing cgroups document.


was (Author: iwasakims):
The doc was added byYARN-4599 but it did not add link. I atttached 001 adding a 
link to site index and existing cgroups document.

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-10048:

Issue Type: Bug  (was: Improvement)

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch, YARN-10048.002.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-10048:

Attachment: YARN-10048.002.patch

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch, YARN-10048.002.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10050:

Priority: Minor  (was: Major)

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000552#comment-17000552
 ] 

Masatake Iwasaki commented on YARN-10050:
-

The doc was added byYARN-4599 but it did not add link. I atttached 001 adding a 
link to site index and existing cgroups document.

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10050:

Attachment: YARN-10050.001.patch

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000540#comment-17000540
 ] 

Weiwei Yang commented on YARN-10042:


+1, it looks good. [~tangzhankun] could you please help to commit this change?

Thanks

> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10050:

Component/s: documentation

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Major
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned YARN-10050:
---

Assignee: Masatake Iwasaki

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Major
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000521#comment-17000521
 ] 

liusheng commented on YARN-10042:
-

Thanks a lot for your  review [~tangzhankun], [~cheersyang] would you please 
take a look ? thank you

> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000337#comment-17000337
 ] 

Hudson commented on YARN-10038:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17780 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17780/])
YARN-10038. [UI] Finish Time is not correctly parsed in the RM Apps (gifuma: 
rev ef59ffd362b9a91be08cbdbaa15aafdf08f00bdc)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/ColumnHeader.java


> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Giovanni Matteo Fumarola (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-10038:

Component/s: webapp

> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Giovanni Matteo Fumarola (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-10038:

Affects Version/s: 3.3.0

> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Giovanni Matteo Fumarola (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-10038:

Priority: Minor  (was: Major)

> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Giovanni Matteo Fumarola (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-10038:

Fix Version/s: 3.3.0

> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10038) [UI] Finish Time is not correctly parsed in the RM Apps page

2019-12-19 Thread Giovanni Matteo Fumarola (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000324#comment-17000324
 ] 

Giovanni Matteo Fumarola commented on YARN-10038:
-

Committed to trunk [^YARN-10038.003.patch].

Thanks [~elgoiri].

> [UI] Finish Time is not correctly parsed in the RM Apps page
> 
>
> Key: YARN-10038
> URL: https://issues.apache.org/jira/browse/YARN-10038
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: YARN-10038.000.patch, YARN-10038.001.patch, 
> YARN-10038.002.patch, YARN-10038.003.patch, image-2019-12-17-11-08-22-026.png
>
>
> The Finish Time shows as the unix time (millis since 1970) instead of as a 
> date:
>  !image-2019-12-17-11-08-22-026.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Miklos Szegedi (Jira)
Miklos Szegedi created YARN-10050:
-

 Summary: NodeManagerCGroupsMemory.md does not show up in the 
official documentation
 Key: YARN-10050
 URL: https://issues.apache.org/jira/browse/YARN-10050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I looked at this doc:

[https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]

It does not show up here:

[https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9892) Capacity scheduler: support DRF ordering policy on queue level

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000303#comment-17000303
 ] 

Hadoop QA commented on YARN-9892:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 57 unchanged - 0 fixed = 59 total (was 57) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
16s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | YARN-9892 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12988806/YARN-9892.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 39ac1b35e3e8 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7868da8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25305/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25305/testReport/ |
| Max. process+thread count | 808 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-12-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000257#comment-17000257
 ] 

Eric Payne commented on YARN-6492:
--

Thanks [~maniraj...@gmail.com] for all the work on this.

bq. ,partition=default"
I think we need to just use an empty string for the default partition to be 
consistent with the other CL interfaces. For example, if you call {{curl 
http://RM:PORT/ws/v1/cluster/scheduler}} to get the capacity scheduler metrics, 
it will display {{"partitionName": "",}} fields for the default partition.
Plus, someone could create a partition named "default", and with your current 
design, you couldn't tell the difference.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues

2019-12-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000205#comment-17000205
 ] 

Eric Payne commented on YARN-9767:
--

[~maniraj...@gmail.com], Thanks for raising this issue and providing a patch.

I am very sorry for my very late reply. Unfortunately, due to my long delay, 
this patch no longer applies.

I am still trying to get my head around the use case. However, looking at the 
code, I'm very concerned that changes related to queue metrics would change 
behavior of {{LeafQueue#getheadroom}}, which is integral to the way the 
capacity scheduler assigns resources. I will need to spend a lot of time 
understanding the use case and the problems this JIRA is solving.


> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined

2019-12-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000184#comment-17000184
 ] 

Eric Payne commented on YARN-10009:
---

This is kind of a blocker for us. Would it be okay if [~ebadger] had a look?

> In Capacity Scheduler, DRC can treat minimum user limit percent as a max when 
> custom resource is defined
> 
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.10.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: YARN-10009.001.patch, YARN-10009.002.patch, 
> YARN-10009.003.patch, YARN-10009.UT.patch, YARN-10009.branch-2.10.003.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2019-12-19 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10049:

Description: 
FIFOPolicy of FS does the following comparisons in addition to app priority 
comparison:

1. Using Start time
2. Using Name

Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy of 
CS.

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2019-12-19 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10049:

Parent: YARN-9698
Issue Type: Sub-task  (was: Bug)

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10049) FIFOOrderingPolicy Improvements

2019-12-19 Thread Manikandan R (Jira)
Manikandan R created YARN-10049:
---

 Summary: FIFOOrderingPolicy Improvements
 Key: YARN-10049
 URL: https://issues.apache.org/jira/browse/YARN-10049
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler

2019-12-19 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000178#comment-17000178
 ] 

Manikandan R commented on YARN-9698:


Reg "Size Based Fairness deep dived" feature's comment "TODO: Is there a 
similar config in CS?" in attached pdf,

yarn.scheduler.fair.sizebasedweight usage in FairSharePolicy of FS is 
equivalent to fair.enable-size-based-weight in FairOrderingPolicy of CS but 
implementation slightly varies. For ex, FairSharePolicy of FS consider app 
priority for weight calculation.

 Also, YARN-10043 has been created to address comment "Is Fair policy 
comparable in FS to CS?" of Queue Ordering Policies feature in pdf.

> [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
> 
>
> Key: YARN-9698
> URL: https://issues.apache.org/jira/browse/YARN-9698
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Weiwei Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: FS-CS Migration.pdf
>
>
> We see some users want to migrate from Fair Scheduler to Capacity Scheduler, 
> this Jira is created as an umbrella to track all related efforts for the 
> migration, the scope contains
>  * Bug fixes
>  * Add missing features
>  * Migration tools that help to generate CS configs based on FS, validate 
> configs etc
>  * Documents
> this is part of CS component, the purpose is to make the migration process 
> smooth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2019-12-19 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000157#comment-17000157
 ] 

Manikandan R commented on YARN-10043:
-

Thanks [~wilfreds]. 

FairOrderingPolicy (FO) does the following:

Compare resource usage of 2 schedulables and give preferences to one with 
lesser usages. Only If {{fair.enable-size-based-weight}} has been enabled, then 
preference would be given to the one with more demand, with the intent to 
favour large apps when many smaller ones enter and leave the queue continuously.

FairSharePolicy of FS does the following:
 # Compare demands. Schedulable without resource demand get lower priority than 
ones who have demands.
 # Compare MinShareUsage
 # Compare FairShareUsage
 # Compare using job submit time
 # Compare using job name

Except #3, other comparisons are missing in FO. #3 is bit closer to 
fair.enable-size-based-weight based flow in FO, but has some 
differences. For example, app priority has been taken into consideration in 
FairSharePolicy during weight calculation.

Proposal is to add these missed comparisons steps in the same order. But I am 
bit skeptical about adding #2, not really sure whether it is possible and makes 
sense.

Thoughts?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10035) Add ability to filter the Cluster Applications API request by name

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000155#comment-17000155
 ] 

Hadoop QA commented on YARN-10035:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 37m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 46s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
28s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
43s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
41s{color} | {color:green} hadoop-yarn-server-router 

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2019-12-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000125#comment-17000125
 ] 

Eric Payne commented on YARN-8292:
--

Thanks a lot [~jhung] for looking at this. I apologize in advance for the 
lengthy response.
{quote}Should we just commit YARN-10033 to branch-2.10 to address the issue you 
fixed between YARN-8292.branch-2.010.patch and YARN-8292.branch-2.10.011.patch? 
Then we can commit YARN-8292.branch-2.10.010.patch to branch-2.10.
{quote}
No, I don't think so. 
 TL;DR: Since there is no cached effective max resource in 2.10, YARN-10033 
can't be backported to 2.10.
 The root cause of the test failure in YARN-10033 was as follows:
 - In 3.x, {{TempQueuePerPartition#getMax}} uses the cached {{effMaxRes}} 
value. In 2.10, {{getMax}} is calculated.
 - When changes for YARN-8292 were added, they changed the amount of 
preemptions in some cases because it is now taking into account resource 
components that are negative and non-negative.
 - When {{TestProportionalCapacityPreemptionPolicy}} mocks effective max 
resource (effMaxRes), it always sets Vcores to 0.
 - Since YARN-8292 changed behavior, 
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}} 
should have also changed the number of expected preemptions. However, since 
{{TestProportionalCapacityPreemptionPolicy}} did not mock {{effMaxRes}} 
correctly, the fact that this unit test should have changed was missed when 
YARN-8292 was put into 3.x. This is what YARN-10033 addressed.
 - The changes made in YARN-8292.branch-2.10.011.patch to 
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}} 
are the same that should have been done originally when YARN-8292 when 
committed to 3.x.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, 
> YARN-8292.009.patch, YARN-8292.branch-2.009.patch, 
> YARN-8292.branch-2.010.patch, YARN-8292.branch-2.10.011.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000102#comment-17000102
 ] 

Hadoop QA commented on YARN-10048:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 6 new + 2 unchanged - 0 fixed = 8 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 32s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestResourceHandlerModule
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsHandlerImpl
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.resourceplugin.TestResourcePluginManager
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | YARN-10048 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989193/YARN-10048.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3320d0355bfa 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7868da8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-10046) RM failed to transition to Active because of App recovery throwing java.lang.NullPointerException

2019-12-19 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1768#comment-1768
 ] 

Wilfred Spiegelenburg commented on YARN-10046:
--

If this is really teh version that you say it is the error happens at this 
point:
{code}
519  SchedulerApplication application = applications.get(
520  applicationAttemptId.getApplicationId());
521  String user = application.getUser();
522  FSLeafQueue queue = (FSLeafQueue) application.getQueue();
{code}

Which would mean that the application is null which makes it the same issue as 
YARN-7913. Please check that one, I have started working on a fix for it.

Normally this failure means that you have changed the scheduler configuration 
so much that we cannot handle it on recovery.

> RM failed to transition to Active because of App recovery throwing 
> java.lang.NullPointerException
> -
>
> Key: YARN-10046
> URL: https://issues.apache.org/jira/browse/YARN-10046
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Yong Xing
>Priority: Critical
>
>  
> CDH Distribution: Hadoop 3.0.0-cdh6.0.1
> The exception stack is as follows.
> 2019-12-12 17:09:41,422 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:521)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1221)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1206)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:907)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:116)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:1046)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$2000(RMAppImpl.java:118)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1110)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1051)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:875)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:544)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1393)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:758)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1146)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1186)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1182)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> 

[jira] [Commented] (YARN-10047) container memory monitor may make container exit

2019-12-19 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1760#comment-1760
 ] 

Wilfred Spiegelenburg commented on YARN-10047:
--

This is expected behaviour: the container is using 50GB while it is only 
allowed to use 5GB. If your container needs more memory then you need to give 
it more memory.
The code shown is also not the code that is killing the container: your 
container is over *physical* memory (i.e. real used mem) not over _virtual_ 
memory. Your message is generated here in the 
[ContainersMonitorImpl.java|https://github.com/apache/hadoop/blob/1ac967a6b77c262b23e10c6ca68538b7e4ed39b0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java#L434]


> container memory monitor may make container exit
> 
>
> Key: YARN-10047
> URL: https://issues.apache.org/jira/browse/YARN-10047
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> As below, we have a case which spark driver execute some scripts.Then 
> sometimes the driver will be killed.
> {code:java}
> yarn.174410.log.2019-12-17.02:2019-12-17,06:59:14,831 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container 
> [pid=50529,containerID=container_e917_1576303656075_174957_01_003197] is 
> running beyond physical memory limits. Current usage: 50.28 GB of 5.25 GB 
> physical memory used; xxx. Killing container.
> {code}
> {code:java}
> boolean isProcessTreeOverLimit(String containerId,
>   long currentMemUsage,
>   long curMemUsageOfAgedProcesses,
>   long vmemLimit) {
> boolean isOverLimit = false;
>
> /**
> if (currentMemUsage > (2 * vmemLimit)) {
>   LOG.warn("Process tree for container: " + containerId
>   + " running over twice " + "the configured limit. Limit=" + 
> vmemLimit
>   + ", current usage = " + currentMemUsage);
>   isOverLimit = true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10037) Upgrade build tools for YARN Web UI v2

2019-12-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved YARN-10037.
-
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Upgrade build tools for YARN Web UI v2
> --
>
> Key: YARN-10037
> URL: https://issues.apache.org/jira/browse/YARN-10037
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10037.001.patch
>
>
> The versions of the build tools are too old and have some vulnerabilities. 
> Update.
> * node: 5.12.0 (latest: 12.13.1 LTS)
> * yarn: 0.21.3 (latest: 1.12.1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao reassigned YARN-10048:
---

Assignee: Sen Zhao

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10048) NodeManager fails to start after mounting CGroup

2019-12-19 Thread Sen Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-10048:

Attachment: YARN-10048.001.patch
   Summary: NodeManager fails to start after mounting CGroup  (was: 
挂载CGroup之后NodeManager启动失败)

> NodeManager fails to start after mounting CGroup
> 
>
> Key: YARN-10048
> URL: https://issues.apache.org/jira/browse/YARN-10048
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Priority: Major
> Attachments: YARN-10048.001.patch
>
>
> After manually mounting the Cgroup, the NodeManager fails to start.
> If the cpu controller has multiple mount path, only the first mount path will 
> be returned. This will cause the return value to be not the actual cpu 
> controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10048) 挂载CGroup之后NodeManager启动失败

2019-12-19 Thread Sen Zhao (Jira)
Sen Zhao created YARN-10048:
---

 Summary: 挂载CGroup之后NodeManager启动失败
 Key: YARN-10048
 URL: https://issues.apache.org/jira/browse/YARN-10048
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.2.1
Reporter: Sen Zhao


After manually mounting the Cgroup, the NodeManager fails to start.

If the cpu controller has multiple mount path, only the first mount path will 
be returned. This will cause the return value to be not the actual cpu 
controller mount path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10047) container memory monitor may make container exit

2019-12-19 Thread zhoukang (Jira)
zhoukang created YARN-10047:
---

 Summary: container memory monitor may make container exit
 Key: YARN-10047
 URL: https://issues.apache.org/jira/browse/YARN-10047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhoukang
Assignee: zhoukang


As below, we have a case which spark driver execute some scripts.Then sometimes 
the driver will be killed.

{code:java}
yarn.174410.log.2019-12-17.02:2019-12-17,06:59:14,831 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container 
[pid=50529,containerID=container_e917_1576303656075_174957_01_003197] is 
running beyond physical memory limits. Current usage: 50.28 GB of 5.25 GB 
physical memory used; xxx. Killing container.
{code}

{code:java}
boolean isProcessTreeOverLimit(String containerId,
  long currentMemUsage,
  long curMemUsageOfAgedProcesses,
  long vmemLimit) {
boolean isOverLimit = false;
   
/**
if (currentMemUsage > (2 * vmemLimit)) {
  LOG.warn("Process tree for container: " + containerId
  + " running over twice " + "the configured limit. Limit=" + vmemLimit
  + ", current usage = " + currentMemUsage);
  isOverLimit = true;
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10035) Add ability to filter the Cluster Applications API request by name

2019-12-19 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10035:
--
Attachment: YARN-10035.003.patch

> Add ability to filter the Cluster Applications API request by name
> --
>
> Key: YARN-10035
> URL: https://issues.apache.org/jira/browse/YARN-10035
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10035.001.patch, YARN-10035.002.patch, 
> YARN-10035.003.patch
>
>
> According to the 
> [documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html]
>  we don't support filtering by name in the Cluster Applications API request.
> Usually application tags are a perfect way for tracking applications, but for 
> MR applications the older CLIs usually doesn't support providing app tags, 
> while specifying the name of the job is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10035) Add ability to filter the Cluster Applications API request by name

2019-12-19 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1698#comment-1698
 ] 

Adam Antal commented on YARN-10035:
---

Hi [~snemeth],

Thanks for the review! Indeed, it is supposed to be opposite. Actually as I 
started testing of this patch, I realized something was not right, I kept 
getting response which did not contained the apps with the provided name. 
Uploaded patch v3 which fixes this.

> Add ability to filter the Cluster Applications API request by name
> --
>
> Key: YARN-10035
> URL: https://issues.apache.org/jira/browse/YARN-10035
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10035.001.patch, YARN-10035.002.patch, 
> YARN-10035.003.patch
>
>
> According to the 
> [documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html]
>  we don't support filtering by name in the Cluster Applications API request.
> Usually application tags are a perfect way for tracking applications, but for 
> MR applications the older CLIs usually doesn't support providing app tags, 
> while specifying the name of the job is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10037) Upgrade build tools for YARN Web UI v2

2019-12-19 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1683#comment-1683
 ] 

Hudson commented on YARN-10037:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #1 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/1/])
YARN-10037. Upgrade build tools for YARN Web UI v2. (iwasakims: rev 
7868da894ae148bff1d5e5159a2bc1aad44fd6aa)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/README.md
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/pom.xml


> Upgrade build tools for YARN Web UI v2
> --
>
> Key: YARN-10037
> URL: https://issues.apache.org/jira/browse/YARN-10037
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10037.001.patch
>
>
> The versions of the build tools are too old and have some vulnerabilities. 
> Update.
> * node: 5.12.0 (latest: 12.13.1 LTS)
> * yarn: 0.21.3 (latest: 1.12.1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10037) Upgrade build tools for YARN Web UI v2

2019-12-19 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1673#comment-1673
 ] 

Masatake Iwasaki commented on YARN-10037:
-

Thanks, [~aajisaka]. I committed this.

> Upgrade build tools for YARN Web UI v2
> --
>
> Key: YARN-10037
> URL: https://issues.apache.org/jira/browse/YARN-10037
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10037.001.patch
>
>
> The versions of the build tools are too old and have some vulnerabilities. 
> Update.
> * node: 5.12.0 (latest: 12.13.1 LTS)
> * yarn: 0.21.3 (latest: 1.12.1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1658#comment-1658
 ] 

Zhankun Tang commented on YARN-10042:
-

[~seanlau], Thanks for catching this. The patch looks good to me. The failure 
test cases seems not related. The "testDeadNodeDetectionInBackground" failure 
appears in other Jira too. And the other two test case failures are out of 
memory. +1.

[~cheersyang], since this is related to CSI dependencies, would you like to 
take a look at this?

> Uupgrade grpc-xxx depdencies to 1.26.0
> --
>
> Key: YARN-10042
> URL: https://issues.apache.org/jira/browse/YARN-10042
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Priority: Major
> Attachments: YARN-10042.001.patch, 
> hadoop_build_aarch64_grpc_1.26.0.log, hadoop_build_x86_64_grpc_1.26.0.log, 
> yarn_csi_tests_aarch64_grpc_1.26.0.log, yarn_csi_tests_x86_64_grpc_1.26.0.log
>
>
> For now, Hadoop YARN use grpc-context, grpc-core, grpc-netty, grpc-protobuf, 
> grpc-protobuf-lite, grpc-stub and protoc-gen-grpc-java of version 1.15.1, but 
> the "protoc-gen-grpc-java" cannot support on aarch64 platform. Now the 
> grpc-java repo has support aarch64 platform and release in 1.26.0 in maven 
> central.
> see:
> [https://github.com/grpc/grpc-java/pull/6496]
> [https://search.maven.org/search?q=g:io.grpc]
>  It is better to upgrade the version of grpc-xxx dependencies to 1.26.0 
> version. both x86_64 and aarch64 server are building OK accroding to my 
> testing, please see the attachment, they are: log of building on aarch64, log 
> of building on x86_64, log of running tests of yarn csi on aarch64, log of 
> running tests of yarn csi on x86_64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10046) RM failed to transition to Active because of App recovery throwing java.lang.NullPointerException

2019-12-19 Thread Yong Xing (Jira)
Yong Xing created YARN-10046:


 Summary: RM failed to transition to Active because of App recovery 
throwing java.lang.NullPointerException
 Key: YARN-10046
 URL: https://issues.apache.org/jira/browse/YARN-10046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Yong Xing


 

CDH Distribution: Hadoop 3.0.0-cdh6.0.1

The exception stack is as follows.

2019-12-12 17:09:41,422 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
load/recover state
java.lang.NullPointerException
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:521)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1221)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1265)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1206)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:907)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:116)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:1046)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$2000(RMAppImpl.java:118)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1110)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1051)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:875)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:544)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1393)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:758)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1146)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1186)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1182)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1182)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)

 

During the recovery of Application attempts, the status of one app attempt is 
NULL. The following LOG  describes:

2019-12-12 17:09:41,381 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1576136386231_0742 with 1 attempts and final state = 

[jira] [Commented] (YARN-10037) Upgrade build tools for YARN Web UI v2

2019-12-19 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1607#comment-1607
 ] 

Akira Ajisaka commented on YARN-10037:
--

the modification to Dockerfile is here -> 
https://github.com/apache/hadoop/pull/1772

> Upgrade build tools for YARN Web UI v2
> --
>
> Key: YARN-10037
> URL: https://issues.apache.org/jira/browse/YARN-10037
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10037.001.patch
>
>
> The versions of the build tools are too old and have some vulnerabilities. 
> Update.
> * node: 5.12.0 (latest: 12.13.1 LTS)
> * yarn: 0.21.3 (latest: 1.12.1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10036) Install yarnpkg in Dockerfile

2019-12-19 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned YARN-10036:


Assignee: Akira Ajisaka

> Install yarnpkg in Dockerfile
> -
>
> Key: YARN-10036
> URL: https://issues.apache.org/jira/browse/YARN-10036
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: buid, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>
> Now node.js is installed in Dockerfile but yarnpkg is not installed.
> I'd like to run "yarn upgade" command in the build env to manage and upgrade 
> the dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10037) Upgrade build tools for YARN Web UI v2

2019-12-19 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1600#comment-1600
 ] 

Akira Ajisaka commented on YARN-10037:
--

Thanks [~iwasakims] for the patch.
+1, I modified dev-support/docker/Dockerfile to install Node.js v8.17.0, Bower 
1.8.8, and Yarn 1.12.1 and ran yarn run build in the container successfully.

> Upgrade build tools for YARN Web UI v2
> --
>
> Key: YARN-10037
> URL: https://issues.apache.org/jira/browse/YARN-10037
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: YARN-10037.001.patch
>
>
> The versions of the build tools are too old and have some vulnerabilities. 
> Update.
> * node: 5.12.0 (latest: 12.13.1 LTS)
> * yarn: 0.21.3 (latest: 1.12.1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10042) Uupgrade grpc-xxx depdencies to 1.26.0

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999886#comment-16999886
 ] 

Hadoop QA commented on YARN-10042:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 15m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}170m 42s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}326m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | YARN-10042 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989160/YARN-10042.001.patch |
| Optional Tests |  dupname  asflicense  shellcheck  shelldocs  compile  javac  
javadoc  mvninstall  mvnsite  unit  shadedclient  xml  |
| uname | Linux 85309fbda97b 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7b93575 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 

[jira] [Commented] (YARN-9998) Code cleanup in LeveldbConfigurationStore

2019-12-19 Thread Oleg Bonar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999872#comment-16999872
 ] 

Oleg Bonar commented on YARN-9998:
--

HI [~snemeth]. Is this still needed? If so, can I take it?

> Code cleanup in LeveldbConfigurationStore
> -
>
> Key: YARN-9998
> URL: https://issues.apache.org/jira/browse/YARN-9998
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>
> Many things can be improved:
> * Field compactionTimer could be a local variable
> * Field versiondb should be camelcase
> * initDatabase is a very long method: Initialize db / versionDb should be in 
> separate methods, split this method into smaller chunks
> * Remove TODOs
> * Remove duplicated code block in 
> LeveldbConfigurationStore.CompactionTimerTask
> * Any other cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10041) Should not use AbstractPath to create unix domain socket

2019-12-19 Thread Zhenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999839#comment-16999839
 ] 

Zhenyu Zheng commented on YARN-10041:
-

[~tangzhankun] Liu Shang has provided a patch

> Should not use AbstractPath to create unix domain socket
> 
>
> Key: YARN-10041
> URL: https://issues.apache.org/jira/browse/YARN-10041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
> Environment: X86/ARM
> OS: ubuntu 1804
> java: java8
>Reporter: zhao bo
>Priority: Major
>
> This issue hits by a very coincidental scene. That 's happend when we test on 
> ARM.
> The test case is:
> org.apache.hadoop.yarn.csi.client.TestCsiClient.testIdentityService
>  
> The step is:
> If we make the hadoop source code dir to a very deep dir path, this case 
> would be pass at the first time running, but always fail in the following 
> tries.
> The official jenkins doesn't cover this, because it runs on Docker container 
> and just run test 1 time. So it looks like alway pass.
>  
> The  key point is the UNIX domain socket path exceed the limit of 
> UNIX_PATH_MAX(108). Please see [1]
>  
> This issue is very difficult to locate, as it will always return binding 
> failed when we exec the test.
>  
> Also I saw the hadoop code in trunk branch, the code use the AbsolutePath to 
> create the UNIX DOMAIN SOCKET file. The source code is [2]. So that can not 
> forbid to hit this issue. That's good to provide a second way to set the 
> socket path to '/tmp' or any place when exec this test.
> [1] 
> [https://serverfault.com/questions/641347/check-if-a-path-exceeds-maximum-for-unix-domain-socket]
> [2] 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/src/test/java/org/apache/hadoop/yarn/csi/client/TestCsiClient.java#L48]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org