[jira] [Created] (YARN-9932) Nodelabel support for Fair Scheduler

2019-10-22 Thread Anuj (Jira)
Anuj created YARN-9932:
--

 Summary: Nodelabel support for Fair Scheduler
 Key: YARN-9932
 URL: https://issues.apache.org/jira/browse/YARN-9932
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler, nodemanager, resourcemanager
Affects Versions: 3.2.1
Reporter: Anuj


Currently Node labels only work capacity scheduler.

We would like to have this working with Fair Scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-22 Thread Zhenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957487#comment-16957487
 ] 

Zhenyu Zheng commented on YARN-9897:


[~eyang]BTW, we have actually started to run tests and debug for about a month 
now and we can pass all YARN tests with only a few fixs like:
https://issues.apache.org/jira/browse/HADOOP-16614 (it is only a possible 
proposal, we are open for discussions)

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an Aarch64 CI for YARN to promote the support for 
> YARN on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-22 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957484#comment-16957484
 ] 

liusheng edited comment on YARN-9897 at 10/23/19 1:34 AM:
--

Hi [~eyang],

I have tried your the two tests suggested in your comment, looks like both are 
success.
{code:java}
[INFO]  C M A K E B U I L D E RT E S T
[INFO] ---
[INFO] cetest: running 
/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest
 --gtest_filter=-Perf. 
--gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml
[INFO] with extra environment variables {}
[INFO] STATUS: SUCCESS after 154 millisecond(s).
[INFO] ---
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  01:01 min
[INFO] Finished at: 2019-10-23T01:29:41Z
[INFO] 
{code}
 
{code:java}
[INFO] ---
[INFO]  C M A K E B U I L D E RT E S T
[INFO] ---
[INFO] test-container-executor: running 
/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
[INFO] with extra environment variables {}
[INFO] STATUS: SUCCESS after 5968 millisecond(s).
[INFO] ---
[INFO]
[INFO] --- hadoop-maven-plugins:3.3.0-SNAPSHOT:cmake-test (cetest) @ 
hadoop-yarn-server-nodemanager ---
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  01:07 min
[INFO] Finished at: 2019-10-23T01:32:28Z
[INFO] 
{code}
Does this look good to you ?

 


was (Author: seanlau):
Hi [~eyang],

Looks like both these two tests are OK, see:

 
{code:java}
[INFO]  C M A K E B U I L D E RT E S T
[INFO] ---
[INFO] cetest: running 
/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest
 --gtest_filter=-Perf. 
--gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml
[INFO] with extra environment variables {}
[INFO] STATUS: SUCCESS after 154 millisecond(s).
[INFO] ---
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  01:01 min
[INFO] Finished at: 2019-10-23T01:29:41Z
[INFO] 
{code}
 

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an 

[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-22 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957484#comment-16957484
 ] 

liusheng commented on YARN-9897:


Hi [~eyang],

Looks like both these two tests are OK, see:

 
{code:java}
[INFO]  C M A K E B U I L D E RT E S T
[INFO] ---
[INFO] cetest: running 
/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest
 --gtest_filter=-Perf. 
--gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml
[INFO] with extra environment variables {}
[INFO] STATUS: SUCCESS after 154 millisecond(s).
[INFO] ---
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  01:01 min
[INFO] Finished at: 2019-10-23T01:29:41Z
[INFO] 
{code}
 

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an Aarch64 CI for YARN to promote the support for 
> YARN on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-22 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957475#comment-16957475
 ] 

Eric Yang commented on YARN-9897:
-

[~Kevin_Zheng] The patch looks good to me.  I am surprised how little change 
requierd.  Thanks for sharing the information.  Would it be possible to run:

{code}mvn clean test -Dtest=cetest -Pnative{code}
{code}mvn clean test -Dtest=test-container-executor -Pnative{code}

 in hadoop-yarn-nodemanager project for sanity check?

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an Aarch64 CI for YARN to promote the support for 
> YARN on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9689) Router does not support kerberos proxy when in secure mode

2019-10-22 Thread Botong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957445#comment-16957445
 ] 

Botong Huang commented on YARN-9689:


+1 lgtm

> Router does not support kerberos proxy when in secure mode
> --
>
> Key: YARN-9689
> URL: https://issues.apache.org/jira/browse/YARN-9689
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9689.001.patch
>
>
> When we enable kerberos in YARN-Federation mode, we can not get new app since 
> it will throw kerberos exception below.Which should be handled!
> {code:java}
> 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2019-07-22,18:43:25,528 WARN 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: 
> Unable to create a new ApplicationId in SubCluster xxx
> java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed 
> on local exception: java.io.IOException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564)
> at org.apache.hadoop.ipc.Client.call(Client.java:1506)
> at org.apache.hadoop.ipc.Client.call(Client.java:1416)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
> at org.apache.

[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-10-22 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957379#comment-16957379
 ] 

Eric Badger commented on YARN-9923:
---

Isn't it more appropriate for this to be in the nm health check script? The 
docker daemon can (and will) go down at any time due to a bug or other 
random issue. But we don't want to do this check before every container that we 
start. So if a user chose the RUNTIME option, the only way I see this working 
is to have a thread periodically checking whether docker is installed and 
running. But that's exactly what the nm health check script does.

> Detect missing Docker binary or not running Docker daemon
> -
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957378#comment-16957378
 ] 

Wangda Tan commented on YARN-9927:
--

Thanks [~hcarrot] for working on this.

Tagging: [~prabhujoseph] , [~jhung] ,[~sunil.gov...@gmail.com] , [~epayne] for 
review.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957341#comment-16957341
 ] 

Hadoop QA commented on YARN-9697:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: 
The patch generated 0 new + 9 unchanged - 4 fixed = 9 total (was 13) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 95m  
0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9697 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983771/YARN-9697.008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8b751e6fbf22 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6020505 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
http

[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957315#comment-16957315
 ] 

Hadoop QA commented on YARN-9788:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 
55s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
27s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9788 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983762/YARN-9788-009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 66845899dc0c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6020505 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25030/testReport/ |
| Max. process+thread count | 880 (vs. ulimit 

[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957262#comment-16957262
 ] 

Hadoop QA commented on YARN-9925:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9925 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983757/YARN-9925-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f4a4827815c9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6020505 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25029/testReport/ |
| Max. process+thread count | 805 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25029/console |
| Powered by | Apache Y

[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957250#comment-16957250
 ] 

Abhishek Modi commented on YARN-9697:
-

Thanks [~bibinchundatt] for the review. I have addressed most of the review 
comments in v8 patch.

For 
{quote}OpportunisticSchedulerMetrics shouldn't we be having a destroy() method 
to reset the counters. During switch over i think we should reset the counters 
{quote}
I will file a separate jira.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.008.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957248#comment-16957248
 ] 

Hadoop QA commented on YARN-9780:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9780 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983754/YARN-9780-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ab24eabef998 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6020505 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25028/testReport/ |
| Max. process+thread count | 833 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25028/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> SchedulerConf Mutation Api does not Al

[jira] [Commented] (YARN-9918) AggregatedAllocatedContainers metrics not getting reported for MR in 2.6.x

2019-10-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957232#comment-16957232
 ] 

Manikandan R commented on YARN-9918:


Can you add more details to reproduce this issue? FYI, this metric computation 
happens only for "default" partition. Please refer YARN-6467 for more details.

> AggregatedAllocatedContainers metrics not getting reported for MR in 2.6.x
> --
>
> Key: YARN-9918
> URL: https://issues.apache.org/jira/browse/YARN-9918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Minor
>
> One of our YARN clusters is 2.6.x cdh. I have observed that aggregated 
> allocated container metrics are not getting reported for the MR jobs. Some 
> queues have specific MR workload, but that queue always shows 0 as 
> "aggregatedAllocatedContainers".
>  
> Created this Jira to track this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2019-10-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957231#comment-16957231
 ] 

Manikandan R commented on YARN-9930:


Is this different from YARN-9887?

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957230#comment-16957230
 ] 

Manikandan R commented on YARN-9925:


YARN-9772 has been created to address the concerns raised here. We had some 
discussions over there about two 2 different approaches, but not yet reached 
any conclusion. cc [~sunilg]

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9926) RM multi-thread event processing mechanism

2019-10-22 Thread Bibin Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin Chundatt resolved YARN-9926.
--
Resolution: Duplicate

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9926
> URL: https://issues.apache.org/jira/browse/YARN-9926
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that the proportion of RMNodeStatusEvent is less than other 
> events, but the overall processing time of it is more than other events. 
> Meanwhile, RM event processing is in a single-thread mode, and It results in 
> the decrease of RM's performance. So we proposed a RM multi-thread event 
> processing mechanism to improve RM performance. Is this mechanism feasible?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9788:

Attachment: YARN-9788-009.patch

> Queue Management API - does not support parallel updates
> 
>
> Key: YARN-9788
> URL: https://issues.apache.org/jira/browse/YARN-9788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9788-001.patch, YARN-9788-002.patch, 
> YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, 
> YARN-9788-006.patch, YARN-9788-007.patch, YARN-9788-008.patch, 
> YARN-9788-009.patch
>
>
> Queue Management API - does not support parallel updates. When there are two 
> parallel schedule conf updates (logAndApplyMutation), the first update is 
> overwritten by the second one. 
> Currently the logAndApplyMutation creates LogMutation and stores it in a 
> variable pendingMutation. This way at any given time there will be only one 
> LogMutation. And so the two parallel logAndApplyMutation will override the 
> pendingMutation and the later one only will be present.
> The fix is to return LogMutation object by logAndApplyMutation which can be 
> passed during confirmMutation. This fixes the parallel updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957179#comment-16957179
 ] 

Hadoop QA commented on YARN-9788:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  2m 
21s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  2m 21s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
26s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m  
8s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 26s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9788 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983742/YARN-9788-008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit 

[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: YARN-9925-002.patch

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9780:

Attachment: YARN-9780-004.patch

> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call
> 
>
> Key: YARN-9780
> URL: https://issues.apache.org/jira/browse/YARN-9780
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9780-001.patch, YARN-9780-002.patch, 
> YARN-9780-003.patch, YARN-9780-004.patch
>
>
> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call. The queue has to be stopped before removing and so it is useful to 
> allow both Stop and remove queue in a single call.
> *Repro:*
> {code:java}
> Capacity-Scheduler.xml:
> yarn.scheduler.capacity.root.queues = new, default, dummy
> yarn.scheduler.capacity.root.default.capacity = 60
> yarn.scheduler.capacity.root.dummy.capacity = 30
> yarn.scheduler.capacity.root.new.capacity = 10   
> curl -v -X PUT -d @abc.xml -H "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf'
> abc.xml
> 
>   
>   root.default
>   
> 
>   capacity
>   70
> 
>   
> 
> 
>   root.new
>   
> 
>   state
>   STOPPED
> 
>   
> 
> root.new
> 
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957168#comment-16957168
 ] 

Hadoop QA commented on YARN-9537:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m  
6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9537 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983734/YARN-9537-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2ab4e3cc3bfe 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 72003b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25024/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25024/testReport/ |
| Max. process+thread count | 817 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourc

[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957161#comment-16957161
 ] 

Hadoop QA commented on YARN-9925:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 15s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} The patch fails to run checkstyle in 
hadoop-yarn-server-resourcemanager {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
46s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9925 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983741/YARN-9925-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5063448ca290 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6020505 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/25026/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 
https://build

[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9788:

Attachment: YARN-9788-008.patch

> Queue Management API - does not support parallel updates
> 
>
> Key: YARN-9788
> URL: https://issues.apache.org/jira/browse/YARN-9788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9788-001.patch, YARN-9788-002.patch, 
> YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, 
> YARN-9788-006.patch, YARN-9788-007.patch, YARN-9788-008.patch
>
>
> Queue Management API - does not support parallel updates. When there are two 
> parallel schedule conf updates (logAndApplyMutation), the first update is 
> overwritten by the second one. 
> Currently the logAndApplyMutation creates LogMutation and stores it in a 
> variable pendingMutation. This way at any given time there will be only one 
> LogMutation. And so the two parallel logAndApplyMutation will override the 
> pendingMutation and the later one only will be present.
> The fix is to return LogMutation object by logAndApplyMutation which can be 
> passed during confirmMutation. This fixes the parallel updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9929) NodeManager OOM because of stuck DeletionService

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957125#comment-16957125
 ] 

Hadoop QA commented on YARN-9929:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
21s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-server-nodemanager in trunk failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
43s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9929 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983736/YARN-9929.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a3de11274f8a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 19f35cf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/25025/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25025/testReport/ |
| Max. process+thread count | 400 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-

[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: YARN-9925-001.patch

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag

2019-10-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957114#comment-16957114
 ] 

Hadoop QA commented on YARN-9886:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 6 new + 313 unchanged - 0 fixed = 319 total (was 313) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
12s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9886 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983723/YARN-9886.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux b37cbda06227 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019

[jira] [Commented] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM

2019-10-22 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957101#comment-16957101
 ] 

Tarun Parimi commented on YARN-9928:


The issue is occurring since container returned in below code snippet becomes 
null.

{code:java}
  private void publishContainerCreatedEvent(ContainerEvent event) {
if (publishNMContainerEvents) {
  ContainerId containerId = event.getContainerID();
  ContainerEntity entity = createContainerEntity(containerId);
  Container container = context.getContainers().get(containerId);
  Resource resource = container.getResource();
{code}

This issue does not usually occur because there is a previous null check for 
the same done in ContainerManagerImpl . 

{code:java}
Map containers =
ContainerManagerImpl.this.context.getContainers();
  Container c = containers.get(event.getContainerID());
  if (c != null) {
c.handle(event);
if (nmMetricsPublisher != null) {
  nmMetricsPublisher.publishContainerEvent(event);
}
{code}

But in a heavily loaded prod cluster with lots of events in the 
ContainerManager dispatcher and when NM is also resyncing with RM at the same 
time in a separate NM dispatcher thread, it can suddenly remove all the 
completed containers.

So an additional null check is needed for the container in these scenarios.




> ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
> --
>
> Key: YARN-9928
> URL: https://issues.apache.org/jira/browse/YARN-9928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> Encountered the below FATAL error in the NodeManager which was under heavy 
> load and was also resyncing with RM at the same. This caused the NM to go 
> down. 
> {code:java}
> 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9931) Support run script before kill container

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9931:
---
Description: 
Like node health check script. We can add a pre-kill script which run before 
kill container.

For example we can save the thread dump before kill the container, which is 
helpful for troubleshooting.

  was:
Like node health check script. We can add a pre-kill script which run before 
kill container.

Such as we can save the thread dump before kill the container, which is helpful 
for troubleshooting.


> Support run script before kill container
> 
>
> Key: YARN-9931
> URL: https://issues.apache.org/jira/browse/YARN-9931
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> Like node health check script. We can add a pre-kill script which run before 
> kill container.
> For example we can save the thread dump before kill the container, which is 
> helpful for troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9931) Support run script before kill container

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957095#comment-16957095
 ] 

zhoukang commented on YARN-9931:


[~weiweiyagn666] [~tangzhankun]

> Support run script before kill container
> 
>
> Key: YARN-9931
> URL: https://issues.apache.org/jira/browse/YARN-9931
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> Like node health check script. We can add a pre-kill script which run before 
> kill container.
> For example we can save the thread dump before kill the container, which is 
> helpful for troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9931) Support run script before kill container

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9931:
---
Component/s: nodemanager

> Support run script before kill container
> 
>
> Key: YARN-9931
> URL: https://issues.apache.org/jira/browse/YARN-9931
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> Like node health check script. We can add a pre-kill script which run before 
> kill container.
> Such as we can save the thread dump before kill the container, which is 
> helpful for troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-22 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Description: 
CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When creating 
a queue with same name as an existing parent queue name - it has to fail with 
below.
{code:java}
Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
from:root.A to:root.B.A after refresh, which is not allowed. at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
 ... 70 more 
{code}
In Some cases, the error is not thrown while creating the queue but thrown at 
submission of job "Failed to submit application_1571677375269_0002 to YARN : 
Application application_1571677375269_0002 submitted by user : systest to 
non-leaf queue : B"

Below scenarios are allowed but it should not
{code:java}
It allows root.A.A1.B when root.B.B1 already exists.
   
1. Add root.A
2. Add root.A.A1
3. Add root.B
4. Add root.B.B1
5. Allows Add of root.A.A1.B 

It allows two root queues:
   
1. Add root.A
2. Add root.B
3. Add root.A.A1
4. Allows Add of root.A.A1.root
 
{code}
Below scenario is handled properly:
{code:java}
It does not allow root.B.A when root.A.A1 already exists.
 
1. Add root.A
2. Add root.B
3. Add root.A.A1
4. Does not Allow Add of root.B.A
{code}
This error handling has to be consistent in all scenarios.

  was:
CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When creating 
a queue with same name as an existing parent queue name - it has to fail with 
below.
{code:java}
Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
from:root.A to:root.B.A after refresh, which is not allowed. at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
 ... 70 more 
{code}
In Some cases, the error is not thrown while creating the queue but thrown at 
submission of job "Failed to submit application_1571677375269_0002 to YARN : 
Application application_1571677375269_0002 submitted by user : systest to 
non-leaf queue : B"

Below scenarios are allowed but it should not
{code:java}
It allows root.A.A1.B when root.B.B1 exists already.
   
1. Add root.A
2. Add root.A.A1
3. Add root.B
4. Add root.B.B1
5. Allows Add of root.A.A1.B 

It allows two root queues:
   
1. Add root.A
2. Add root.B
3. Add root.A.A1
4. Allows Add of root.A.A1.root
 
{code}
Below scenario is handled properly:
{code:java}
It does not allow root.B.A when root.A.A1 exists already.
 
1. Add root.A
2. Add root.B
3. Add root.A.A1
4. Does not Allow Add of root.B.A
{code}
This error handling has to be consistent in all scenarios.


> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.sched

[jira] [Created] (YARN-9931) Support run script before kill container

2019-10-22 Thread zhoukang (Jira)
zhoukang created YARN-9931:
--

 Summary: Support run script before kill container
 Key: YARN-9931
 URL: https://issues.apache.org/jira/browse/YARN-9931
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhoukang
Assignee: zhoukang


Like node health check script. We can add a pre-kill script which run before 
kill container.

Such as we can save the thread dump before kill the container, which is helpful 
for troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize

2019-10-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957077#comment-16957077
 ] 

David Mollitor commented on YARN-9863:
--

[~szegedim] Any chance you've been able to review my remarks?  Thanks!

> Randomize List of Resources to Localize
> ---
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-10-22 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957074#comment-16957074
 ] 

Prabhu Joseph commented on YARN-9789:
-

Thanks [~pbacsko] for the review.

[~snemeth] Can you review and commit this Jira when you get time. Thanks.

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9789-001.patch
>
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-10-22 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957073#comment-16957073
 ] 

Prabhu Joseph commented on YARN-9781:
-

Thanks [~pbacsko] for the review.

[~snemeth] Can you review and commit this Jira when you get time. Thanks.

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957071#comment-16957071
 ] 

zhoukang edited comment on YARN-7621 at 10/22/19 1:40 PM:
--

I think  we can solve this problem with full path [~wilfreds]


was (Author: cane):
I think with full path we can solve this problem [~wilfreds]

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957071#comment-16957071
 ] 

zhoukang commented on YARN-7621:


I think with full path we can solve this problem [~wilfreds]

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957069#comment-16957069
 ] 

zhoukang commented on YARN-7621:


[~jiwq] Agree with you, Sorry for late reply.And any progress for this jira? 
[~cheersyang] [~Tao Yang]

Thanks!

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9930) Support max running app logic for CapacityScheduler

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9930:
---
Parent: YARN-9698
Issue Type: Sub-task  (was: Improvement)

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9930) Support max running app logic for CapacityScheduler

2019-10-22 Thread zhoukang (Jira)
zhoukang created YARN-9930:
--

 Summary: Support max running app logic for CapacityScheduler
 Key: YARN-9930
 URL: https://issues.apache.org/jira/browse/YARN-9930
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, capacityscheduler
Affects Versions: 3.1.1, 3.1.0
Reporter: zhoukang
Assignee: zhoukang


In FairScheduler, there has limitation for max running which will let 
application pending.
But in CapacityScheduler there has no feature like max running app.Only got max 
app,and jobs will be rejected directly on client.

This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9748) Allow capacity-scheduler configuration on HDFS and support reload from HDFS

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957041#comment-16957041
 ] 

zhoukang commented on YARN-9748:


I want add a service like 
{code:java}
AllocationFileLoaderService
{code}
in FairScheduler [~Prabhu Joseph][~cheersyang][~tangzhankun]

> Allow capacity-scheduler configuration on HDFS and support reload from HDFS
> ---
>
> Key: YARN-9748
> URL: https://issues.apache.org/jira/browse/YARN-9748
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> Improvement:
> Support auto reload from hdfs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9929) NodeManager OOM because of stuck DeletionService

2019-10-22 Thread kyungwan nam (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957039#comment-16957039
 ] 

kyungwan nam commented on YARN-9929:


attaches a patch, which set the timeout for _ShellCommandExecutor_
any comments and suggestions are welcome

> NodeManager OOM because of stuck DeletionService
> 
>
> Key: YARN-9929
> URL: https://issues.apache.org/jira/browse/YARN-9929
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9929.001.patch, nm_heapdump.png
>
>
> NMs go through frequent Full GC due to a lack of heap memory.
> we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the 
> heap dump (screenshot is attached)
> and after analyzing the thread dump, we can figure out _DeletionService_ gets 
> stuck in _executeStatusCommand_ which run 'docker inspect'
> {code:java}
> "DeletionService #0" - Thread t@41
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:995)
>   at org.apache.hadoop.util.Shell.run(Shell.java:902)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) 
> {code}
> also, we found 'docker inspect' processes are running for a long time as 
> follows.
> {code:java}
>  root      95637  0.0  0.0 2650984 35776 ?       Sl   Aug23   5:48 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e30_1555419799458_0014_01_30
> root      95638  0.0  0.0 2773860 33908 ?       Sl   Aug23   5:33 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_25316_01_001455
> root      95641  0.0  0.0 2445924 34204 ?       Sl   Aug23   5:34 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e49_1560851258686_2107_01_24
> root      95643  0.0  0.0 2642532 34428 ?       Sl   Aug23   5:30 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_8111_01_

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957037#comment-16957037
 ] 

zhoukang commented on YARN-9927:


nice idea, we also want to do similar job. looking forward for the poc 
[~hcarrot]

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9929) NodeManager OOM because of stuck DeletionService

2019-10-22 Thread kyungwan nam (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9929:
---
Attachment: YARN-9929.001.patch

> NodeManager OOM because of stuck DeletionService
> 
>
> Key: YARN-9929
> URL: https://issues.apache.org/jira/browse/YARN-9929
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9929.001.patch, nm_heapdump.png
>
>
> NMs go through frequent Full GC due to a lack of heap memory.
> we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the 
> heap dump (screenshot is attached)
> and after analyzing the thread dump, we can figure out _DeletionService_ gets 
> stuck in _executeStatusCommand_ which run 'docker inspect'
> {code:java}
> "DeletionService #0" - Thread t@41
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:995)
>   at org.apache.hadoop.util.Shell.run(Shell.java:902)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) 
> {code}
> also, we found 'docker inspect' processes are running for a long time as 
> follows.
> {code:java}
>  root      95637  0.0  0.0 2650984 35776 ?       Sl   Aug23   5:48 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e30_1555419799458_0014_01_30
> root      95638  0.0  0.0 2773860 33908 ?       Sl   Aug23   5:33 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_25316_01_001455
> root      95641  0.0  0.0 2445924 34204 ?       Sl   Aug23   5:34 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e49_1560851258686_2107_01_24
> root      95643  0.0  0.0 2642532 34428 ?       Sl   Aug23   5:30 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_8111_01_002657{code}
>  
> I think It has occurred since docker daemon is restarted. 
> 'docker inspect' which was run while restarting th

[jira] [Updated] (YARN-9929) NodeManager OOM because of stuck DeletionService

2019-10-22 Thread kyungwan nam (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9929:
---
Attachment: nm_heapdump.png

> NodeManager OOM because of stuck DeletionService
> 
>
> Key: YARN-9929
> URL: https://issues.apache.org/jira/browse/YARN-9929
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: nm_heapdump.png
>
>
> NMs go through frequent Full GC due to a lack of heap memory.
> we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the 
> heap dump (screenshot is attached)
> and after analyzing the thread dump, we can figure out _DeletionService_ gets 
> stuck in _executeStatusCommand_ which run 'docker inspect'
> {code:java}
> "DeletionService #0" - Thread t@41
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <3e45c938> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:995)
>   at org.apache.hadoop.util.Shell.run(Shell.java:902)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) 
> {code}
> also, we found 'docker inspect' processes are running for a long time as 
> follows.
> {code:java}
>  root      95637  0.0  0.0 2650984 35776 ?       Sl   Aug23   5:48 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e30_1555419799458_0014_01_30
> root      95638  0.0  0.0 2773860 33908 ?       Sl   Aug23   5:33 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_25316_01_001455
> root      95641  0.0  0.0 2445924 34204 ?       Sl   Aug23   5:34 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e49_1560851258686_2107_01_24
> root      95643  0.0  0.0 2642532 34428 ?       Sl   Aug23   5:30 
> /usr/bin/docker inspect --format={{.State.Status}} 
> container_e50_1561100493387_8111_01_002657{code}
>  
> I think It has occurred since docker daemon is restarted. 
> 'docker inspect' which was run while restarting the docker daemon was not 

[jira] [Created] (YARN-9929) NodeManager OOM because of stuck DeletionService

2019-10-22 Thread kyungwan nam (Jira)
kyungwan nam created YARN-9929:
--

 Summary: NodeManager OOM because of stuck DeletionService
 Key: YARN-9929
 URL: https://issues.apache.org/jira/browse/YARN-9929
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: kyungwan nam
Assignee: kyungwan nam


NMs go through frequent Full GC due to a lack of heap memory.
we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the 
heap dump (screenshot is attached)

and after analyzing the thread dump, we can figure out _DeletionService_ gets 
stuck in _executeStatusCommand_ which run 'docker inspect'
{code:java}
"DeletionService #0" - Thread t@41
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked <3e45c938> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.read1(BufferedReader.java:212)
at java.io.BufferedReader.read(BufferedReader.java:286)
- locked <3e45c938> (a java.io.InputStreamReader)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:995)
at org.apache.hadoop.util.Shell.run(Shell.java:902)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) 
{code}
also, we found 'docker inspect' processes are running for a long time as 
follows.
{code:java}
 root      95637  0.0  0.0 2650984 35776 ?       Sl   Aug23   5:48 
/usr/bin/docker inspect --format={{.State.Status}} 
container_e30_1555419799458_0014_01_30
root      95638  0.0  0.0 2773860 33908 ?       Sl   Aug23   5:33 
/usr/bin/docker inspect --format={{.State.Status}} 
container_e50_1561100493387_25316_01_001455
root      95641  0.0  0.0 2445924 34204 ?       Sl   Aug23   5:34 
/usr/bin/docker inspect --format={{.State.Status}} 
container_e49_1560851258686_2107_01_24
root      95643  0.0  0.0 2642532 34428 ?       Sl   Aug23   5:30 
/usr/bin/docker inspect --format={{.State.Status}} 
container_e50_1561100493387_8111_01_002657{code}
 

I think It has occurred since docker daemon is restarted. 
'docker inspect' which was run while restarting the docker daemon was not 
working. and not even it was not terminated.

It can be considered as a docker issue.
but It could happen whenever if 'docker inspect' does not work due to docker 
daemon restarting or docker bug.
It would be good to set the timeout for 'docker inspect' to avoid this issue.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Resolved] (YARN-9851) Make execution type check compatiable

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang resolved YARN-9851.

Resolution: Duplicate

> Make execution type check compatiable
> -
>
> Key: YARN-9851
> URL: https://issues.apache.org/jira/browse/YARN-9851
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9851-001.patch
>
>
> During upgrade from 2.6 to 3.1, we encountered a problem:
> {code:java}
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568719110875_6460_08_01, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_62, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_63, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_64, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_30617_01_06, status: RUNNING, 
> execution type: null
> for (ContainerStatus remoteContainer : containerStatuses) {
>   if (remoteContainer.getState() == ContainerState.RUNNING
>   && remoteContainer.getExecutionType() == ExecutionType.GUARANTEED) {
> nodeContainers.add(remoteContainer.getContainerId());
>   } else {
> LOG.warn("Lost container " + remoteContainer.getContainerId()
> + ", status: " + remoteContainer.getState()
> + ", execution type: " + remoteContainer.getExecutionType());
>   }
> }​
> {code}
> The cause is that we has nm with version 2.6, which do not have executionType 
> for container status.
> We should check here make the upgrade process more tranparently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957020#comment-16957020
 ] 

zhoukang commented on YARN-9537:


A new patch has been attached  [~snemeth]

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9537:
---
Attachment: YARN-9537-002.patch

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9537:
---
Attachment: (was: YARN-9537-002.patch)

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9537:
---
Attachment: YARN-9537-002.patch

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957015#comment-16957015
 ] 

Peter Bacsko commented on YARN-9789:


Patch looks straightforward, +1 non-binding.

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9789-001.patch
>
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956995#comment-16956995
 ] 

Peter Bacsko edited comment on YARN-9923 at 10/22/19 12:16 PM:
---

_"NONE (default): preserving the current behaviour [...]"_

Even the current behaviour can be improved. Right now there are multiple error 
messages, one after the other. If the binary is missing, there's no need to 
emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and 
exit immediately.


was (Author: pbacsko):
_"NONE (default): preserving the current behaviour [...]"_

Even the current behaviour can be improved. Right now there are multiple error 
messages, one after the another. If the binary is missing, there's no need to 
emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and 
exit immediately.

> Detect missing Docker binary or not running Docker daemon
> -
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956995#comment-16956995
 ] 

Peter Bacsko commented on YARN-9923:


_"NONE (default): preserving the current behaviour [...]"_

Even the current behaviour can be improved. Right now there are multiple error 
messages, one after the another. If the binary is missing, there's no need to 
emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and 
exit immediately.

> Detect missing Docker binary or not running Docker daemon
> -
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9689) Router does not support kerberos proxy when in secure mode

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956994#comment-16956994
 ] 

zhoukang commented on YARN-9689:


Could you help review this?  [~botong][~giovanni.fumarola][~tangzhankun]

> Router does not support kerberos proxy when in secure mode
> --
>
> Key: YARN-9689
> URL: https://issues.apache.org/jira/browse/YARN-9689
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9689.001.patch
>
>
> When we enable kerberos in YARN-Federation mode, we can not get new app since 
> it will throw kerberos exception below.Which should be handled!
> {code:java}
> 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2019-07-22,18:43:25,528 WARN 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: 
> Unable to create a new ApplicationId in SubCluster xxx
> java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed 
> on local exception: java.io.IOException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564)
> at org.apache.hadoop.ipc.Client.call(Client.java:1506)
> at org.apache.hadoop.ipc.Client.call(Client.java:1416)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs

[jira] [Assigned] (YARN-9689) Router does not support kerberos proxy when in secure mode

2019-10-22 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang reassigned YARN-9689:
--

Assignee: zhoukang

> Router does not support kerberos proxy when in secure mode
> --
>
> Key: YARN-9689
> URL: https://issues.apache.org/jira/browse/YARN-9689
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9689.001.patch
>
>
> When we enable kerberos in YARN-Federation mode, we can not get new app since 
> it will throw kerberos exception below.Which should be handled!
> {code:java}
> 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2019-07-22,18:43:25,528 WARN 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: 
> Unable to create a new ApplicationId in SubCluster xxx
> java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed 
> on local exception: java.io.IOException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564)
> at org.apache.hadoop.ipc.Client.call(Client.java:1506)
> at org.apache.hadoop.ipc.Client.call(Client.java:1416)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:26

[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956962#comment-16956962
 ] 

Bibin Chundatt commented on YARN-9697:
--

Thank you [~abmodi] for updating patch

Few comments and suggestion

# OpportunisticContainerAllocatorAMService -> NodeQueueLoadMonitor init could 
be moved to AbstractService#serviceinit
# NodeQueueLoadMonitor ScheduledExecutorService#scheduledExecutor shutdown not 
done
# NodeQueueLoadMonitor#nodeIdsByRack do we need the NodeIds to be sorted ??
# Thoughts on replacing NodeQueueLoadMonitor#addIntoNodeIdsByRack  as follows 
{code}
  private void addIntoNodeIdsByRack(RMNode addedNode) {
nodeIdsByRack.compute(addedNode.getRackName(), (k, v) -> v == null ?
new ConcurrentHashMap().newKeySet() :
v).add(addedNode.getNodeID());
  }
{code}
# We could  think of replacing NodeQueueLoadMonitor#removeFromNodeIdsByRack too 
with computeifPresent

Not related to patch

# OpportunisticSchedulerMetrics shouldn't we be having a destroy() method to 
reset the counters. During switch over i think we should reset the counters ?

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.ut.patch, 
> YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956961#comment-16956961
 ] 

zhoukang commented on YARN-9537:


Ok i will fix now!thanks [~snemeth]

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956959#comment-16956959
 ] 

zhoukang commented on YARN-9605:


[~weichiu][~tangzhankun] Any suggestion?thanks

> Add ZkConfiguredFailoverProxyProvider for RM HA
> ---
>
> Key: YARN-9605
> URL: https://issues.apache.org/jira/browse/YARN-9605
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9605.001.patch, YARN-9605.002.patch
>
>
> In this issue, i will track a new feature to support 
> ZkConfiguredFailoverProxyProvider for RM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956958#comment-16956958
 ] 

zhoukang commented on YARN-9605:


The failed test is below which i think is not related with this patch:

{code:java}
Stacktrace
org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningQuotaException:
 Integral (avg over time) quota capacity 0.25 over a window of 86400 seconds,  
would be exceeded by accepting reservation: 
reservation_6128220156127328780_6678871933820709847
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityOverTimePolicy.validate(CapacityOverTimePolicy.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.InMemoryPlan.addReservation(InMemoryPlan.java:348)
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:141)
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
{code}


> Add ZkConfiguredFailoverProxyProvider for RM HA
> ---
>
> Key: YARN-9605
> URL: https://issues.apache.org/jira/browse/YARN-9605
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9605.001.patch, YARN-9605.002.patch
>
>
> In this issue, i will track a new feature to support 
> ZkConfiguredFailoverProxyProvider for RM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956957#comment-16956957
 ] 

Peter Bacsko commented on YARN-9788:


Thanks for the patch [~Prabhu Joseph]. I think the patch looks good.

Really just one nitpick: the test name {{testParallelUpdates}} might be 
slightly misleading because things are not really happening in parallel. It's 
more like making sure that updates are not lost. So a better name would be sth 
like {{testMultipleUpdates}} or {{testMultipleUpdatesNotLost}}, etc. 

Otherwise +1 non-binding.

> Queue Management API - does not support parallel updates
> 
>
> Key: YARN-9788
> URL: https://issues.apache.org/jira/browse/YARN-9788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9788-001.patch, YARN-9788-002.patch, 
> YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, 
> YARN-9788-006.patch, YARN-9788-007.patch
>
>
> Queue Management API - does not support parallel updates. When there are two 
> parallel schedule conf updates (logAndApplyMutation), the first update is 
> overwritten by the second one. 
> Currently the logAndApplyMutation creates LogMutation and stores it in a 
> variable pendingMutation. This way at any given time there will be only one 
> LogMutation. And so the two parallel logAndApplyMutation will override the 
> pendingMutation and the later one only will be present.
> The fix is to return LogMutation object by logAndApplyMutation which can be 
> passed during confirmMutation. This fixes the parallel updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9916) Improving Async Dispatcher

2019-10-22 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956952#comment-16956952
 ] 

Adam Antal commented on YARN-9916:
--

I think this is related (if not the dupe) to YARN-9927.

> Improving Async Dispatcher
> --
>
> Key: YARN-9916
> URL: https://issues.apache.org/jira/browse/YARN-9916
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
>
> Currently, async dispatcher works in the single-threaded model.
>  
> There is another queue for the scheduler handler, but not all handlers are 
> non-blocking. In our cluster, this queue can go sometimes to 16M events, 
> which takes time to drain.
>  
> We should think of improving it:
>  
>  # Either make multi-threads in the dispatcher which will pick queue events, 
> but this would require careful evaluation of the order of events.
>  # Or Make all downstream handlers similar to scheduler queue (this also 
> needs careful evaluation of out of order events).
> Any other ideas are also welcome.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956949#comment-16956949
 ] 

hcarrot commented on YARN-9927:
---

The performance bottleneck is the single-thread RMEventDispatcher mode. Events 
are processed one by one. If we change single-thread to multi-thread, RM can 
process different events concurrently.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956941#comment-16956941
 ] 

Peter Bacsko commented on YARN-9780:


[~Prabhu Joseph] I have some minor comments:

#1 Nit: pay attention the missing white spaces
{noformat}
String newQueueState = newConf.get(configPrefix+"state");
{noformat}
 

#2 I suggest the following piece of code to retrieve {{newQueueState}} with 
error handling:
{noformat}
  String configPrefix = newConf.getQueuePrefix(
  oldQueue.getQueuePath());
  
   try {
 QueueState newQueueState = QueueState.valueOf(
   newConf.get(configPrefix + "state"));
   } catch (IllegalArgumentException) {
   // handle illegal string for state
   }

// no need to null check newQueueState
if (oldQueue.getState() == QueueState.STOPPED ||
   newQueueState != QueueState.STOPPED) {
...{noformat}
#3 Nit: add some (or more) meaningful assertion messages:
{noformat}
assertEquals(1, newCSConf.getQueues("root.a").length);
assertEquals("a1", newCSConf.getQueues("root.a")[0]);{noformat}

> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call
> 
>
> Key: YARN-9780
> URL: https://issues.apache.org/jira/browse/YARN-9780
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9780-001.patch, YARN-9780-002.patch, 
> YARN-9780-003.patch
>
>
> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call. The queue has to be stopped before removing and so it is useful to 
> allow both Stop and remove queue in a single call.
> *Repro:*
> {code:java}
> Capacity-Scheduler.xml:
> yarn.scheduler.capacity.root.queues = new, default, dummy
> yarn.scheduler.capacity.root.default.capacity = 60
> yarn.scheduler.capacity.root.dummy.capacity = 30
> yarn.scheduler.capacity.root.new.capacity = 10   
> curl -v -X PUT -d @abc.xml -H "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf'
> abc.xml
> 
>   
>   root.default
>   
> 
>   capacity
>   70
> 
>   
> 
> 
>   root.new
>   
> 
>   state
>   STOPPED
> 
>   
> 
> root.new
> 
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag

2019-10-22 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956938#comment-16956938
 ] 

Kinga Marton commented on YARN-9886:


In the attached patch 001 I have addressed the following issues:
 * changed the id pattern from {{userid=}} to {{u=}}
 * added property for enabling this feature
 * added a property for specifying the users who can do such operation
 * added unit tests

I also wanted to add a small information to the documentation, but I didn't 
found the proper place for it. I was searching for a part where the scheduler 
common things are documented. 

> Queue mapping based on userid passed through application tag
> 
>
> Key: YARN-9886
> URL: https://issues.apache.org/jira/browse/YARN-9886
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on this username. Unfortunately in this case YARN 
> doesn't have any information about the real user and there are cases when the 
> customer may want to map this applications to the real submitting user's 
> queue instead of the Hive one.
> For this cases if they would pass the username in the application tag we may 
> read it and use that one during the queue mapping, if that user has rights to 
> run on the real user's queue.  
> [~sunilg] please correct me if I missed something.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9886) Queue mapping based on userid passed through application tag

2019-10-22 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton updated YARN-9886:
---
Attachment: YARN-9886.001.patch

> Queue mapping based on userid passed through application tag
> 
>
> Key: YARN-9886
> URL: https://issues.apache.org/jira/browse/YARN-9886
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on this username. Unfortunately in this case YARN 
> doesn't have any information about the real user and there are cases when the 
> customer may want to map this applications to the real submitting user's 
> queue instead of the Hive one.
> For this cases if they would pass the username in the application tag we may 
> read it and use that one during the queue mapping, if that user has rights to 
> run on the real user's queue.  
> [~sunilg] please correct me if I missed something.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-10-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956934#comment-16956934
 ] 

Peter Bacsko commented on YARN-9781:


LGTM +1 (non-binding)

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM

2019-10-22 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-9928:
---
Component/s: ATSv2

> ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
> --
>
> Key: YARN-9928
> URL: https://issues.apache.org/jira/browse/YARN-9928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> Encountered the below FATAL error in the NodeManager which was under heavy 
> load and was also resyncing with RM at the same. This caused the NM to go 
> down. 
> {code:java}
> 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM

2019-10-22 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-9928:
---
Affects Version/s: 3.1.0

> ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
> --
>
> Key: YARN-9928
> URL: https://issues.apache.org/jira/browse/YARN-9928
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> Encountered the below FATAL error in the NodeManager which was under heavy 
> load and was also resyncing with RM at the same. This caused the NM to go 
> down. 
> {code:java}
> 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM

2019-10-22 Thread Tarun Parimi (Jira)
Tarun Parimi created YARN-9928:
--

 Summary: ATSv2 can make NM go down with a FATAL error while it is 
resyncing with RM
 Key: YARN-9928
 URL: https://issues.apache.org/jira/browse/YARN-9928
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tarun Parimi
Assignee: Tarun Parimi


Encountered the below FATAL error in the NodeManager which was under heavy load 
and was also resyncing with RM at the same. This caused the NM to go down. 


{code:java}
2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9886) Queue mapping based on userid passed through application tag

2019-10-22 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956730#comment-16956730
 ] 

Kinga Marton edited comment on YARN-9886 at 10/22/19 7:41 AM:
--

[~wangda] yes. I will add a whitelist, where it can be defined who can use this 
feature.


was (Author: kmarton):
[~wangda] yes. I will add whitelist, where it can be defined who can use this 
feature.

> Queue mapping based on userid passed through application tag
> 
>
> Key: YARN-9886
> URL: https://issues.apache.org/jira/browse/YARN-9886
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-9886-WIP.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on this username. Unfortunately in this case YARN 
> doesn't have any information about the real user and there are cases when the 
> customer may want to map this applications to the real submitting user's 
> queue instead of the Hive one.
> For this cases if they would pass the username in the application tag we may 
> read it and use that one during the queue mapping, if that user has rights to 
> run on the real user's queue.  
> [~sunilg] please correct me if I missed something.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-22 Thread Zhenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956777#comment-16956777
 ] 

Zhenyu Zheng commented on YARN-9897:


Some updates, our team has succesfully donated ARM resources and setup an ARM 
CI for Apache Spark:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
it will set to periodic job and then PR trigger when we think it is stable 
enough. And it includes some basic YARN tests, and it seems OK.

I really hope we can do the same for YARN.

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an Aarch64 CI for YARN to promote the support for 
> YARN on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hcarrot updated YARN-9927:
--
Priority: Major  (was: Minor)

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hcarrot updated YARN-9927:
--
Description: 
Recently, we have observed serious event blocking in RM event dispatcher queue. 
After analysis of RM event monitoring data and RM event processing logic, we 
found that

1) environment: a cluster with thousands of nodes

2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler

3) Meanwhile, RM event processing is in a single-thread mode, and It results in 
the low headroom of RM event scheduler, thus performance of RM.

So we proposed a RM multi-thread event processing mechanism to improve RM 
performance.

  was:Recently, we have observed serious event blocking in RM event dispatcher 
queue. After analysis of RM event monitoring data and RM event processing 
logic, we found that the proportion of RMNodeStatusEvent is less than other 
events, but the overall processing time of it is more than other events. 
Meanwhile, RM event processing is in a single-thread mode, and It results in 
the decrease of RM's performance. So we proposed a RM multi-thread event 
processing mechanism to improve RM performance.


> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956764#comment-16956764
 ] 

Adam Antal commented on YARN-9927:
--

Thanks for filing this [~hcarrot], interesting approach.

One question that came to my mind is that: are you certain the the dispatcher 
is the real bottleneck here? 
I mean if an event processing requires holding the lock the whole time, then we 
just replace the time in the dispatcher queue with lock-holding time for each 
event. We should dig down that for a certain event type how long the lock 
should be hold.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that the proportion of RMNodeStatusEvent is less than other 
> events, but the overall processing time of it is more than other events. 
> Meanwhile, RM event processing is in a single-thread mode, and It results in 
> the decrease of RM's performance. So we proposed a RM multi-thread event 
> processing mechanism to improve RM performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hcarrot updated YARN-9927:
--
Affects Version/s: 3.0.0
   2.9.2

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that the proportion of RMNodeStatusEvent is less than other 
> events, but the overall processing time of it is more than other events. 
> Meanwhile, RM event processing is in a single-thread mode, and It results in 
> the decrease of RM's performance. So we proposed a RM multi-thread event 
> processing mechanism to improve RM performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hcarrot updated YARN-9927:
--
Description: Recently, we have observed serious event blocking in RM event 
dispatcher queue. After analysis of RM event monitoring data and RM event 
processing logic, we found that the proportion of RMNodeStatusEvent is less 
than other events, but the overall processing time of it is more than other 
events. Meanwhile, RM event processing is in a single-thread mode, and It 
results in the decrease of RM's performance. So we proposed a RM multi-thread 
event processing mechanism to improve RM performance.  (was: Recently, we have 
observed serious event blocking in RM event dispatcher queue. After analysis of 
RM event monitoring data and RM event processing logic, we found that the 
proportion of RMNodeStatusEvent is less than other events, but the overall 
processing time of it is more than other events. Meanwhile, RM event processing 
is in a single-thread mode, and It results in the decrease of RM's performance. 
So we proposed a RM multi-thread event processing mechanism to improve RM 
performance. Is this mechanism feasible?)

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that the proportion of RMNodeStatusEvent is less than other 
> events, but the overall processing time of it is more than other events. 
> Meanwhile, RM event processing is in a single-thread mode, and It results in 
> the decrease of RM's performance. So we proposed a RM multi-thread event 
> processing mechanism to improve RM performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436

2019-10-22 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956747#comment-16956747
 ] 

Adam Antal commented on YARN-9511:
--

Hi [~seanlau],

I can repro the steps you described above with one exception: my default umask 
on my Mac is 0022, so the test passes by default on JDK8. Also could please 
confirm that you are using JDK11 (this issue is primarily about the JDK11 
related part). If the test failed on JDK8, of course we should fix it, but it 
passes on my local.

I am not really familiar with the umask defaults, but I think this is related 
to your environment. What machine do you run the tests on?

> [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: 
> The remote jarfile should not be writable by group or others. The current 
> Permission is 436
> ---
>
> Key: YARN-9511
> URL: https://issues.apache.org/jira/browse/YARN-9511
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Assignee: Szilard Nemeth
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8.
> {code}
> [ERROR] 
> testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.551 s  <<< 
> ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote 
> jarfile should not be writable by group or others. The current Permission is 
> 436
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org