[jira] [Created] (YARN-9932) Nodelabel support for Fair Scheduler
Anuj created YARN-9932: -- Summary: Nodelabel support for Fair Scheduler Key: YARN-9932 URL: https://issues.apache.org/jira/browse/YARN-9932 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler, nodemanager, resourcemanager Affects Versions: 3.2.1 Reporter: Anuj Currently Node labels only work capacity scheduler. We would like to have this working with Fair Scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN
[ https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957487#comment-16957487 ] Zhenyu Zheng commented on YARN-9897: [~eyang]BTW, we have actually started to run tests and debug for about a month now and we can pass all YARN tests with only a few fixs like: https://issues.apache.org/jira/browse/HADOOP-16614 (it is only a possible proposal, we are open for discussions) > Add an Aarch64 CI for YARN > -- > > Key: YARN-9897 > URL: https://issues.apache.org/jira/browse/YARN-9897 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, test >Reporter: Zhenyu Zheng >Priority: Major > Attachments: hadoop_build.log > > > As YARN is the resource manager of Hadoop and there are large number of other > software that also uses YARN for resource management. The capability of > running YARN on platforms with different architecture and managing hardware > resources with different architecture could be very important and useful. > Aarch64(ARM) architecture is currently the dominate architecture in small > devices like phone, IOT devices, security cameras, drones etc. With the > increasing compuiting capability and the increasing connection speed like 5G > network, there could be greate posibility and opportunity for world chaging > inovations and new market if we can managing and make use of those devices as > well. > Currently, all YARN CIs are based on x86 architecture and we have been > performing tests on Aarch64 and proposing possible solutions for problems we > have meet, like: > https://issues.apache.org/jira/browse/HADOOP-16614 > we have done all YARN tests and it turns out there are only a few problems, > and we can provide possible solutions for discussion. > We want to propose to add an Aarch64 CI for YARN to promote the support for > YARN on Aarch64 platforms. We are willing to provide machines to the current > CI system and manpower to mananging the CI and fxing problems that occours. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9897) Add an Aarch64 CI for YARN
[ https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957484#comment-16957484 ] liusheng edited comment on YARN-9897 at 10/23/19 1:34 AM: -- Hi [~eyang], I have tried your the two tests suggested in your comment, looks like both are success. {code:java} [INFO] C M A K E B U I L D E RT E S T [INFO] --- [INFO] cetest: running /home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest --gtest_filter=-Perf. --gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml [INFO] with extra environment variables {} [INFO] STATUS: SUCCESS after 154 millisecond(s). [INFO] --- [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:01 min [INFO] Finished at: 2019-10-23T01:29:41Z [INFO] {code} {code:java} [INFO] --- [INFO] C M A K E B U I L D E RT E S T [INFO] --- [INFO] test-container-executor: running /home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor [INFO] with extra environment variables {} [INFO] STATUS: SUCCESS after 5968 millisecond(s). [INFO] --- [INFO] [INFO] --- hadoop-maven-plugins:3.3.0-SNAPSHOT:cmake-test (cetest) @ hadoop-yarn-server-nodemanager --- [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:07 min [INFO] Finished at: 2019-10-23T01:32:28Z [INFO] {code} Does this look good to you ? was (Author: seanlau): Hi [~eyang], Looks like both these two tests are OK, see: {code:java} [INFO] C M A K E B U I L D E RT E S T [INFO] --- [INFO] cetest: running /home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest --gtest_filter=-Perf. --gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml [INFO] with extra environment variables {} [INFO] STATUS: SUCCESS after 154 millisecond(s). [INFO] --- [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:01 min [INFO] Finished at: 2019-10-23T01:29:41Z [INFO] {code} > Add an Aarch64 CI for YARN > -- > > Key: YARN-9897 > URL: https://issues.apache.org/jira/browse/YARN-9897 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, test >Reporter: Zhenyu Zheng >Priority: Major > Attachments: hadoop_build.log > > > As YARN is the resource manager of Hadoop and there are large number of other > software that also uses YARN for resource management. The capability of > running YARN on platforms with different architecture and managing hardware > resources with different architecture could be very important and useful. > Aarch64(ARM) architecture is currently the dominate architecture in small > devices like phone, IOT devices, security cameras, drones etc. With the > increasing compuiting capability and the increasing connection speed like 5G > network, there could be greate posibility and opportunity for world chaging > inovations and new market if we can managing and make use of those devices as > well. > Currently, all YARN CIs are based on x86 architecture and we have been > performing tests on Aarch64 and proposing possible solutions for problems we > have meet, like: > https://issues.apache.org/jira/browse/HADOOP-16614 > we have done all YARN tests and it turns out there are only a few problems, > and we can provide possible solutions for discussion. > We want to propose to add an
[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN
[ https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957484#comment-16957484 ] liusheng commented on YARN-9897: Hi [~eyang], Looks like both these two tests are OK, see: {code:java} [INFO] C M A K E B U I L D E RT E S T [INFO] --- [INFO] cetest: running /home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest --gtest_filter=-Perf. --gtest_output=xml:/home/zuul/src/github.com/liusheng/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml [INFO] with extra environment variables {} [INFO] STATUS: SUCCESS after 154 millisecond(s). [INFO] --- [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:01 min [INFO] Finished at: 2019-10-23T01:29:41Z [INFO] {code} > Add an Aarch64 CI for YARN > -- > > Key: YARN-9897 > URL: https://issues.apache.org/jira/browse/YARN-9897 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, test >Reporter: Zhenyu Zheng >Priority: Major > Attachments: hadoop_build.log > > > As YARN is the resource manager of Hadoop and there are large number of other > software that also uses YARN for resource management. The capability of > running YARN on platforms with different architecture and managing hardware > resources with different architecture could be very important and useful. > Aarch64(ARM) architecture is currently the dominate architecture in small > devices like phone, IOT devices, security cameras, drones etc. With the > increasing compuiting capability and the increasing connection speed like 5G > network, there could be greate posibility and opportunity for world chaging > inovations and new market if we can managing and make use of those devices as > well. > Currently, all YARN CIs are based on x86 architecture and we have been > performing tests on Aarch64 and proposing possible solutions for problems we > have meet, like: > https://issues.apache.org/jira/browse/HADOOP-16614 > we have done all YARN tests and it turns out there are only a few problems, > and we can provide possible solutions for discussion. > We want to propose to add an Aarch64 CI for YARN to promote the support for > YARN on Aarch64 platforms. We are willing to provide machines to the current > CI system and manpower to mananging the CI and fxing problems that occours. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN
[ https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957475#comment-16957475 ] Eric Yang commented on YARN-9897: - [~Kevin_Zheng] The patch looks good to me. I am surprised how little change requierd. Thanks for sharing the information. Would it be possible to run: {code}mvn clean test -Dtest=cetest -Pnative{code} {code}mvn clean test -Dtest=test-container-executor -Pnative{code} in hadoop-yarn-nodemanager project for sanity check? > Add an Aarch64 CI for YARN > -- > > Key: YARN-9897 > URL: https://issues.apache.org/jira/browse/YARN-9897 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, test >Reporter: Zhenyu Zheng >Priority: Major > Attachments: hadoop_build.log > > > As YARN is the resource manager of Hadoop and there are large number of other > software that also uses YARN for resource management. The capability of > running YARN on platforms with different architecture and managing hardware > resources with different architecture could be very important and useful. > Aarch64(ARM) architecture is currently the dominate architecture in small > devices like phone, IOT devices, security cameras, drones etc. With the > increasing compuiting capability and the increasing connection speed like 5G > network, there could be greate posibility and opportunity for world chaging > inovations and new market if we can managing and make use of those devices as > well. > Currently, all YARN CIs are based on x86 architecture and we have been > performing tests on Aarch64 and proposing possible solutions for problems we > have meet, like: > https://issues.apache.org/jira/browse/HADOOP-16614 > we have done all YARN tests and it turns out there are only a few problems, > and we can provide possible solutions for discussion. > We want to propose to add an Aarch64 CI for YARN to promote the support for > YARN on Aarch64 platforms. We are willing to provide machines to the current > CI system and manpower to mananging the CI and fxing problems that occours. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9689) Router does not support kerberos proxy when in secure mode
[ https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957445#comment-16957445 ] Botong Huang commented on YARN-9689: +1 lgtm > Router does not support kerberos proxy when in secure mode > -- > > Key: YARN-9689 > URL: https://issues.apache.org/jira/browse/YARN-9689 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9689.001.patch > > > When we enable kerberos in YARN-Federation mode, we can not get new app since > it will throw kerberos exception below.Which should be handled! > {code:java} > 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 2019-07-22,18:43:25,528 WARN > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: > Unable to create a new ApplicationId in SubCluster xxx > java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed > on local exception: java.io.IOException: javax.security.sasl.SaslException: > GSS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564) > at org.apache.hadoop.ipc.Client.call(Client.java:1506) > at org.apache.hadoop.ipc.Client.call(Client.java:1416) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716) > at org.apache.
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957379#comment-16957379 ] Eric Badger commented on YARN-9923: --- Isn't it more appropriate for this to be in the nm health check script? The docker daemon can (and will) go down at any time due to a bug or other random issue. But we don't want to do this check before every container that we start. So if a user chose the RUNTIME option, the only way I see this working is to have a thread periodically checking whether docker is installed and running. But that's exactly what the nm health check script does. > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957378#comment-16957378 ] Wangda Tan commented on YARN-9927: -- Thanks [~hcarrot] for working on this. Tagging: [~prabhujoseph] , [~jhung] ,[~sunil.gov...@gmail.com] , [~epayne] for review. > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957341#comment-16957341 ] Hadoop QA commented on YARN-9697: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 0 new + 9 unchanged - 4 fixed = 9 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 95m 0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9697 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983771/YARN-9697.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8b751e6fbf22 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6020505 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | http
[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates
[ https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957315#comment-16957315 ] Hadoop QA commented on YARN-9788: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 55s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 27s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}183m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9788 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983762/YARN-9788-009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 66845899dc0c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6020505 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25030/testReport/ | | Max. process+thread count | 880 (vs. ulimit
[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957262#comment-16957262 ] Hadoop QA commented on YARN-9925: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9925 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983757/YARN-9925-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f4a4827815c9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6020505 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25029/testReport/ | | Max. process+thread count | 805 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25029/console | | Powered by | Apache Y
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957250#comment-16957250 ] Abhishek Modi commented on YARN-9697: - Thanks [~bibinchundatt] for the review. I have addressed most of the review comments in v8 patch. For {quote}OpportunisticSchedulerMetrics shouldn't we be having a destroy() method to reset the counters. During switch over i think we should reset the counters {quote} I will file a separate jira. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.008.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call
[ https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957248#comment-16957248 ] Hadoop QA commented on YARN-9780: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 59s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9780 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983754/YARN-9780-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ab24eabef998 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6020505 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25028/testReport/ | | Max. process+thread count | 833 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25028/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > SchedulerConf Mutation Api does not Al
[jira] [Commented] (YARN-9918) AggregatedAllocatedContainers metrics not getting reported for MR in 2.6.x
[ https://issues.apache.org/jira/browse/YARN-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957232#comment-16957232 ] Manikandan R commented on YARN-9918: Can you add more details to reproduce this issue? FYI, this metric computation happens only for "default" partition. Please refer YARN-6467 for more details. > AggregatedAllocatedContainers metrics not getting reported for MR in 2.6.x > -- > > Key: YARN-9918 > URL: https://issues.apache.org/jira/browse/YARN-9918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Minor > > One of our YARN clusters is 2.6.x cdh. I have observed that aggregated > allocated container metrics are not getting reported for the MR jobs. Some > queues have specific MR workload, but that queue always shows 0 as > "aggregatedAllocatedContainers". > > Created this Jira to track this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957231#comment-16957231 ] Manikandan R commented on YARN-9930: Is this different from YARN-9887? > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957230#comment-16957230 ] Manikandan R commented on YARN-9925: YARN-9772 has been created to address the concerns raised here. We had some discussions over there about two 2 different approaches, but not yet reached any conclusion. cc [~sunilg] > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9925-001.patch, YARN-9925-002.patch > > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) > ... 70 more > {code} > In Some cases, the error is not thrown while creating the queue but thrown at > submission of job "Failed to submit application_1571677375269_0002 to YARN : > Application application_1571677375269_0002 submitted by user : systest to > non-leaf queue : B" > Below scenarios are allowed but it should not > {code:java} > It allows root.A.A1.B when root.B.B1 already exists. > > 1. Add root.A > 2. Add root.A.A1 > 3. Add root.B > 4. Add root.B.B1 > 5. Allows Add of root.A.A1.B > It allows two root queues: > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Allows Add of root.A.A1.root > > {code} > Below scenario is handled properly: > {code:java} > It does not allow root.B.A when root.A.A1 already exists. > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Does not Allow Add of root.B.A > {code} > This error handling has to be consistent in all scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9926) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin Chundatt resolved YARN-9926. -- Resolution: Duplicate > RM multi-thread event processing mechanism > -- > > Key: YARN-9926 > URL: https://issues.apache.org/jira/browse/YARN-9926 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: hcarrot >Priority: Minor > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that the proportion of RMNodeStatusEvent is less than other > events, but the overall processing time of it is more than other events. > Meanwhile, RM event processing is in a single-thread mode, and It results in > the decrease of RM's performance. So we proposed a RM multi-thread event > processing mechanism to improve RM performance. Is this mechanism feasible? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates
[ https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9788: Attachment: YARN-9788-009.patch > Queue Management API - does not support parallel updates > > > Key: YARN-9788 > URL: https://issues.apache.org/jira/browse/YARN-9788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9788-001.patch, YARN-9788-002.patch, > YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, > YARN-9788-006.patch, YARN-9788-007.patch, YARN-9788-008.patch, > YARN-9788-009.patch > > > Queue Management API - does not support parallel updates. When there are two > parallel schedule conf updates (logAndApplyMutation), the first update is > overwritten by the second one. > Currently the logAndApplyMutation creates LogMutation and stores it in a > variable pendingMutation. This way at any given time there will be only one > LogMutation. And so the two parallel logAndApplyMutation will override the > pendingMutation and the later one only will be present. > The fix is to return LogMutation object by logAndApplyMutation which can be > passed during confirmMutation. This fixes the parallel updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates
[ https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957179#comment-16957179 ] Hadoop QA commented on YARN-9788: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 36s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 2m 21s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 21s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 26s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m 8s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 26s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9788 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983742/YARN-9788-008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit
[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9925: Attachment: YARN-9925-002.patch > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9925-001.patch, YARN-9925-002.patch > > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) > ... 70 more > {code} > In Some cases, the error is not thrown while creating the queue but thrown at > submission of job "Failed to submit application_1571677375269_0002 to YARN : > Application application_1571677375269_0002 submitted by user : systest to > non-leaf queue : B" > Below scenarios are allowed but it should not > {code:java} > It allows root.A.A1.B when root.B.B1 already exists. > > 1. Add root.A > 2. Add root.A.A1 > 3. Add root.B > 4. Add root.B.B1 > 5. Allows Add of root.A.A1.B > It allows two root queues: > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Allows Add of root.A.A1.root > > {code} > Below scenario is handled properly: > {code:java} > It does not allow root.B.A when root.A.A1 already exists. > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Does not Allow Add of root.B.A > {code} > This error handling has to be consistent in all scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call
[ https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9780: Attachment: YARN-9780-004.patch > SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single > call > > > Key: YARN-9780 > URL: https://issues.apache.org/jira/browse/YARN-9780 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9780-001.patch, YARN-9780-002.patch, > YARN-9780-003.patch, YARN-9780-004.patch > > > SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single > call. The queue has to be stopped before removing and so it is useful to > allow both Stop and remove queue in a single call. > *Repro:* > {code:java} > Capacity-Scheduler.xml: > yarn.scheduler.capacity.root.queues = new, default, dummy > yarn.scheduler.capacity.root.default.capacity = 60 > yarn.scheduler.capacity.root.dummy.capacity = 30 > yarn.scheduler.capacity.root.new.capacity = 10 > curl -v -X PUT -d @abc.xml -H "Content-type: application/xml" > 'http://:8088/ws/v1/cluster/scheduler-conf' > abc.xml > > > root.default > > > capacity > 70 > > > > > root.new > > > state > STOPPED > > > > root.new > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957168#comment-16957168 ] Hadoop QA commented on YARN-9537: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9537 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983734/YARN-9537-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2ab4e3cc3bfe 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 72003b1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25024/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25024/testReport/ | | Max. process+thread count | 817 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourc
[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957161#comment-16957161 ] Hadoop QA commented on YARN-9925: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-server-resourcemanager {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 46s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9925 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983741/YARN-9925-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5063448ca290 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6020505 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/25026/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | compile | https://build
[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates
[ https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9788: Attachment: YARN-9788-008.patch > Queue Management API - does not support parallel updates > > > Key: YARN-9788 > URL: https://issues.apache.org/jira/browse/YARN-9788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9788-001.patch, YARN-9788-002.patch, > YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, > YARN-9788-006.patch, YARN-9788-007.patch, YARN-9788-008.patch > > > Queue Management API - does not support parallel updates. When there are two > parallel schedule conf updates (logAndApplyMutation), the first update is > overwritten by the second one. > Currently the logAndApplyMutation creates LogMutation and stores it in a > variable pendingMutation. This way at any given time there will be only one > LogMutation. And so the two parallel logAndApplyMutation will override the > pendingMutation and the later one only will be present. > The fix is to return LogMutation object by logAndApplyMutation which can be > passed during confirmMutation. This fixes the parallel updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9929) NodeManager OOM because of stuck DeletionService
[ https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957125#comment-16957125 ] Hadoop QA commented on YARN-9929: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m 21s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 14s{color} | {color:red} hadoop-yarn-server-nodemanager in trunk failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 43s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9929 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983736/YARN-9929.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a3de11274f8a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 19f35cf | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/25025/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25025/testReport/ | | Max. process+thread count | 400 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-
[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9925: Attachment: YARN-9925-001.patch > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9925-001.patch > > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) > ... 70 more > {code} > In Some cases, the error is not thrown while creating the queue but thrown at > submission of job "Failed to submit application_1571677375269_0002 to YARN : > Application application_1571677375269_0002 submitted by user : systest to > non-leaf queue : B" > Below scenarios are allowed but it should not > {code:java} > It allows root.A.A1.B when root.B.B1 already exists. > > 1. Add root.A > 2. Add root.A.A1 > 3. Add root.B > 4. Add root.B.B1 > 5. Allows Add of root.A.A1.B > It allows two root queues: > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Allows Add of root.A.A1.root > > {code} > Below scenario is handled properly: > {code:java} > It does not allow root.B.A when root.A.A1 already exists. > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Does not Allow Add of root.B.A > {code} > This error handling has to be consistent in all scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957114#comment-16957114 ] Hadoop QA commented on YARN-9886: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 6 new + 313 unchanged - 0 fixed = 319 total (was 313) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 12s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9886 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983723/YARN-9886.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux b37cbda06227 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019
[jira] [Commented] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
[ https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957101#comment-16957101 ] Tarun Parimi commented on YARN-9928: The issue is occurring since container returned in below code snippet becomes null. {code:java} private void publishContainerCreatedEvent(ContainerEvent event) { if (publishNMContainerEvents) { ContainerId containerId = event.getContainerID(); ContainerEntity entity = createContainerEntity(containerId); Container container = context.getContainers().get(containerId); Resource resource = container.getResource(); {code} This issue does not usually occur because there is a previous null check for the same done in ContainerManagerImpl . {code:java} Map containers = ContainerManagerImpl.this.context.getContainers(); Container c = containers.get(event.getContainerID()); if (c != null) { c.handle(event); if (nmMetricsPublisher != null) { nmMetricsPublisher.publishContainerEvent(event); } {code} But in a heavily loaded prod cluster with lots of events in the ContainerManager dispatcher and when NM is also resyncing with RM at the same time in a separate NM dispatcher thread, it can suddenly remove all the completed containers. So an additional null check is needed for the container in these scenarios. > ATSv2 can make NM go down with a FATAL error while it is resyncing with RM > -- > > Key: YARN-9928 > URL: https://issues.apache.org/jira/browse/YARN-9928 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > Encountered the below FATAL error in the NodeManager which was under heavy > load and was also resyncing with RM at the same. This caused the NM to go > down. > {code:java} > 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher > (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9931) Support run script before kill container
[ https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9931: --- Description: Like node health check script. We can add a pre-kill script which run before kill container. For example we can save the thread dump before kill the container, which is helpful for troubleshooting. was: Like node health check script. We can add a pre-kill script which run before kill container. Such as we can save the thread dump before kill the container, which is helpful for troubleshooting. > Support run script before kill container > > > Key: YARN-9931 > URL: https://issues.apache.org/jira/browse/YARN-9931 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Like node health check script. We can add a pre-kill script which run before > kill container. > For example we can save the thread dump before kill the container, which is > helpful for troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9931) Support run script before kill container
[ https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957095#comment-16957095 ] zhoukang commented on YARN-9931: [~weiweiyagn666] [~tangzhankun] > Support run script before kill container > > > Key: YARN-9931 > URL: https://issues.apache.org/jira/browse/YARN-9931 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Like node health check script. We can add a pre-kill script which run before > kill container. > For example we can save the thread dump before kill the container, which is > helpful for troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9931) Support run script before kill container
[ https://issues.apache.org/jira/browse/YARN-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9931: --- Component/s: nodemanager > Support run script before kill container > > > Key: YARN-9931 > URL: https://issues.apache.org/jira/browse/YARN-9931 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Like node health check script. We can add a pre-kill script which run before > kill container. > Such as we can save the thread dump before kill the container, which is > helpful for troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9925: Description: CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When creating a queue with same name as an existing parent queue name - it has to fail with below. {code:java} Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after refresh, which is not allowed.Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after refresh, which is not allowed. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) ... 70 more {code} In Some cases, the error is not thrown while creating the queue but thrown at submission of job "Failed to submit application_1571677375269_0002 to YARN : Application application_1571677375269_0002 submitted by user : systest to non-leaf queue : B" Below scenarios are allowed but it should not {code:java} It allows root.A.A1.B when root.B.B1 already exists. 1. Add root.A 2. Add root.A.A1 3. Add root.B 4. Add root.B.B1 5. Allows Add of root.A.A1.B It allows two root queues: 1. Add root.A 2. Add root.B 3. Add root.A.A1 4. Allows Add of root.A.A1.root {code} Below scenario is handled properly: {code:java} It does not allow root.B.A when root.A.A1 already exists. 1. Add root.A 2. Add root.B 3. Add root.A.A1 4. Does not Allow Add of root.B.A {code} This error handling has to be consistent in all scenarios. was: CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When creating a queue with same name as an existing parent queue name - it has to fail with below. {code:java} Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after refresh, which is not allowed.Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after refresh, which is not allowed. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) ... 70 more {code} In Some cases, the error is not thrown while creating the queue but thrown at submission of job "Failed to submit application_1571677375269_0002 to YARN : Application application_1571677375269_0002 submitted by user : systest to non-leaf queue : B" Below scenarios are allowed but it should not {code:java} It allows root.A.A1.B when root.B.B1 exists already. 1. Add root.A 2. Add root.A.A1 3. Add root.B 4. Add root.B.B1 5. Allows Add of root.A.A1.B It allows two root queues: 1. Add root.A 2. Add root.B 3. Add root.A.A1 4. Allows Add of root.A.A1.root {code} Below scenario is handled properly: {code:java} It does not allow root.B.A when root.A.A1 exists already. 1. Add root.A 2. Add root.B 3. Add root.A.A1 4. Does not Allow Add of root.B.A {code} This error handling has to be consistent in all scenarios. > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.sched
[jira] [Created] (YARN-9931) Support run script before kill container
zhoukang created YARN-9931: -- Summary: Support run script before kill container Key: YARN-9931 URL: https://issues.apache.org/jira/browse/YARN-9931 Project: Hadoop YARN Issue Type: Improvement Reporter: zhoukang Assignee: zhoukang Like node health check script. We can add a pre-kill script which run before kill container. Such as we can save the thread dump before kill the container, which is helpful for troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957077#comment-16957077 ] David Mollitor commented on YARN-9863: -- [~szegedim] Any chance you've been able to review my remarks? Thanks! > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation
[ https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957074#comment-16957074 ] Prabhu Joseph commented on YARN-9789: - Thanks [~pbacsko] for the review. [~snemeth] Can you review and commit this Jira when you get time. Thanks. > Disable Option for Write Ahead Logs of LogMutation > -- > > Key: YARN-9789 > URL: https://issues.apache.org/jira/browse/YARN-9789 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9789-001.patch > > > When yarn.scheduler.configuration.store.max-logs is set to zero, the > YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the > backend which is not needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957073#comment-16957073 ] Prabhu Joseph commented on YARN-9781: - Thanks [~pbacsko] for the review. [~snemeth] Can you review and commit this Jira when you get time. Thanks. > SchedConfCli to get current stored scheduler configuration > -- > > Key: YARN-9781 > URL: https://issues.apache.org/jira/browse/YARN-9781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9781-001.patch, YARN-9781-002.patch, > YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch > > > SchedConfCLI currently allows to add / remove / remove queue. It does not > support get configuration which RMWebServices provides as part of YARN-8559. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7621) Support submitting apps with queue path for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957071#comment-16957071 ] zhoukang edited comment on YARN-7621 at 10/22/19 1:40 PM: -- I think we can solve this problem with full path [~wilfreds] was (Author: cane): I think with full path we can solve this problem [~wilfreds] > Support submitting apps with queue path for CapacityScheduler > - > > Key: YARN-7621 > URL: https://issues.apache.org/jira/browse/YARN-7621 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: fs2cs > Attachments: YARN-7621.001.patch, YARN-7621.002.patch > > > Currently there is a difference of queue definition in > ApplicationSubmissionContext between CapacityScheduler and FairScheduler. > FairScheduler needs queue path but CapacityScheduler needs queue name. There > is no doubt of the correction of queue definition for CapacityScheduler > because it does not allow duplicate leaf queue names, but it's hard to switch > between FairScheduler and CapacityScheduler. I propose to support submitting > apps with queue path for CapacityScheduler to make the interface clearer and > scheduler switch smoothly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957071#comment-16957071 ] zhoukang commented on YARN-7621: I think with full path we can solve this problem [~wilfreds] > Support submitting apps with queue path for CapacityScheduler > - > > Key: YARN-7621 > URL: https://issues.apache.org/jira/browse/YARN-7621 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: fs2cs > Attachments: YARN-7621.001.patch, YARN-7621.002.patch > > > Currently there is a difference of queue definition in > ApplicationSubmissionContext between CapacityScheduler and FairScheduler. > FairScheduler needs queue path but CapacityScheduler needs queue name. There > is no doubt of the correction of queue definition for CapacityScheduler > because it does not allow duplicate leaf queue names, but it's hard to switch > between FairScheduler and CapacityScheduler. I propose to support submitting > apps with queue path for CapacityScheduler to make the interface clearer and > scheduler switch smoothly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957069#comment-16957069 ] zhoukang commented on YARN-7621: [~jiwq] Agree with you, Sorry for late reply.And any progress for this jira? [~cheersyang] [~Tao Yang] Thanks! > Support submitting apps with queue path for CapacityScheduler > - > > Key: YARN-7621 > URL: https://issues.apache.org/jira/browse/YARN-7621 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: fs2cs > Attachments: YARN-7621.001.patch, YARN-7621.002.patch > > > Currently there is a difference of queue definition in > ApplicationSubmissionContext between CapacityScheduler and FairScheduler. > FairScheduler needs queue path but CapacityScheduler needs queue name. There > is no doubt of the correction of queue definition for CapacityScheduler > because it does not allow duplicate leaf queue names, but it's hard to switch > between FairScheduler and CapacityScheduler. I propose to support submitting > apps with queue path for CapacityScheduler to make the interface clearer and > scheduler switch smoothly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9930: --- Parent: YARN-9698 Issue Type: Sub-task (was: Improvement) > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9930) Support max running app logic for CapacityScheduler
zhoukang created YARN-9930: -- Summary: Support max running app logic for CapacityScheduler Key: YARN-9930 URL: https://issues.apache.org/jira/browse/YARN-9930 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler, capacityscheduler Affects Versions: 3.1.1, 3.1.0 Reporter: zhoukang Assignee: zhoukang In FairScheduler, there has limitation for max running which will let application pending. But in CapacityScheduler there has no feature like max running app.Only got max app,and jobs will be rejected directly on client. This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9748) Allow capacity-scheduler configuration on HDFS and support reload from HDFS
[ https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957041#comment-16957041 ] zhoukang commented on YARN-9748: I want add a service like {code:java} AllocationFileLoaderService {code} in FairScheduler [~Prabhu Joseph][~cheersyang][~tangzhankun] > Allow capacity-scheduler configuration on HDFS and support reload from HDFS > --- > > Key: YARN-9748 > URL: https://issues.apache.org/jira/browse/YARN-9748 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Improvement: > Support auto reload from hdfs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9929) NodeManager OOM because of stuck DeletionService
[ https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957039#comment-16957039 ] kyungwan nam commented on YARN-9929: attaches a patch, which set the timeout for _ShellCommandExecutor_ any comments and suggestions are welcome > NodeManager OOM because of stuck DeletionService > > > Key: YARN-9929 > URL: https://issues.apache.org/jira/browse/YARN-9929 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-9929.001.patch, nm_heapdump.png > > > NMs go through frequent Full GC due to a lack of heap memory. > we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the > heap dump (screenshot is attached) > and after analyzing the thread dump, we can figure out _DeletionService_ gets > stuck in _executeStatusCommand_ which run 'docker inspect' > {code:java} > "DeletionService #0" - Thread t@41 >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <3e45c938> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <3e45c938> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:995) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {code} > also, we found 'docker inspect' processes are running for a long time as > follows. > {code:java} > root 95637 0.0 0.0 2650984 35776 ? Sl Aug23 5:48 > /usr/bin/docker inspect --format={{.State.Status}} > container_e30_1555419799458_0014_01_30 > root 95638 0.0 0.0 2773860 33908 ? Sl Aug23 5:33 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_25316_01_001455 > root 95641 0.0 0.0 2445924 34204 ? Sl Aug23 5:34 > /usr/bin/docker inspect --format={{.State.Status}} > container_e49_1560851258686_2107_01_24 > root 95643 0.0 0.0 2642532 34428 ? Sl Aug23 5:30 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_8111_01_
[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957037#comment-16957037 ] zhoukang commented on YARN-9927: nice idea, we also want to do similar job. looking forward for the poc [~hcarrot] > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9929) NodeManager OOM because of stuck DeletionService
[ https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-9929: --- Attachment: YARN-9929.001.patch > NodeManager OOM because of stuck DeletionService > > > Key: YARN-9929 > URL: https://issues.apache.org/jira/browse/YARN-9929 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-9929.001.patch, nm_heapdump.png > > > NMs go through frequent Full GC due to a lack of heap memory. > we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the > heap dump (screenshot is attached) > and after analyzing the thread dump, we can figure out _DeletionService_ gets > stuck in _executeStatusCommand_ which run 'docker inspect' > {code:java} > "DeletionService #0" - Thread t@41 >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <3e45c938> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <3e45c938> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:995) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {code} > also, we found 'docker inspect' processes are running for a long time as > follows. > {code:java} > root 95637 0.0 0.0 2650984 35776 ? Sl Aug23 5:48 > /usr/bin/docker inspect --format={{.State.Status}} > container_e30_1555419799458_0014_01_30 > root 95638 0.0 0.0 2773860 33908 ? Sl Aug23 5:33 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_25316_01_001455 > root 95641 0.0 0.0 2445924 34204 ? Sl Aug23 5:34 > /usr/bin/docker inspect --format={{.State.Status}} > container_e49_1560851258686_2107_01_24 > root 95643 0.0 0.0 2642532 34428 ? Sl Aug23 5:30 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_8111_01_002657{code} > > I think It has occurred since docker daemon is restarted. > 'docker inspect' which was run while restarting th
[jira] [Updated] (YARN-9929) NodeManager OOM because of stuck DeletionService
[ https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-9929: --- Attachment: nm_heapdump.png > NodeManager OOM because of stuck DeletionService > > > Key: YARN-9929 > URL: https://issues.apache.org/jira/browse/YARN-9929 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: nm_heapdump.png > > > NMs go through frequent Full GC due to a lack of heap memory. > we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the > heap dump (screenshot is attached) > and after analyzing the thread dump, we can figure out _DeletionService_ gets > stuck in _executeStatusCommand_ which run 'docker inspect' > {code:java} > "DeletionService #0" - Thread t@41 >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <3e45c938> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <3e45c938> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:995) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {code} > also, we found 'docker inspect' processes are running for a long time as > follows. > {code:java} > root 95637 0.0 0.0 2650984 35776 ? Sl Aug23 5:48 > /usr/bin/docker inspect --format={{.State.Status}} > container_e30_1555419799458_0014_01_30 > root 95638 0.0 0.0 2773860 33908 ? Sl Aug23 5:33 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_25316_01_001455 > root 95641 0.0 0.0 2445924 34204 ? Sl Aug23 5:34 > /usr/bin/docker inspect --format={{.State.Status}} > container_e49_1560851258686_2107_01_24 > root 95643 0.0 0.0 2642532 34428 ? Sl Aug23 5:30 > /usr/bin/docker inspect --format={{.State.Status}} > container_e50_1561100493387_8111_01_002657{code} > > I think It has occurred since docker daemon is restarted. > 'docker inspect' which was run while restarting the docker daemon was not
[jira] [Created] (YARN-9929) NodeManager OOM because of stuck DeletionService
kyungwan nam created YARN-9929: -- Summary: NodeManager OOM because of stuck DeletionService Key: YARN-9929 URL: https://issues.apache.org/jira/browse/YARN-9929 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.2 Reporter: kyungwan nam Assignee: kyungwan nam NMs go through frequent Full GC due to a lack of heap memory. we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the heap dump (screenshot is attached) and after analyzing the thread dump, we can figure out _DeletionService_ gets stuck in _executeStatusCommand_ which run 'docker inspect' {code:java} "DeletionService #0" - Thread t@41 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <3e45c938> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) - locked <3e45c938> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240) at org.apache.hadoop.util.Shell.runCommand(Shell.java:995) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937) at org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker) {code} also, we found 'docker inspect' processes are running for a long time as follows. {code:java} root 95637 0.0 0.0 2650984 35776 ? Sl Aug23 5:48 /usr/bin/docker inspect --format={{.State.Status}} container_e30_1555419799458_0014_01_30 root 95638 0.0 0.0 2773860 33908 ? Sl Aug23 5:33 /usr/bin/docker inspect --format={{.State.Status}} container_e50_1561100493387_25316_01_001455 root 95641 0.0 0.0 2445924 34204 ? Sl Aug23 5:34 /usr/bin/docker inspect --format={{.State.Status}} container_e49_1560851258686_2107_01_24 root 95643 0.0 0.0 2642532 34428 ? Sl Aug23 5:30 /usr/bin/docker inspect --format={{.State.Status}} container_e50_1561100493387_8111_01_002657{code} I think It has occurred since docker daemon is restarted. 'docker inspect' which was run while restarting the docker daemon was not working. and not even it was not terminated. It can be considered as a docker issue. but It could happen whenever if 'docker inspect' does not work due to docker daemon restarting or docker bug. It would be good to set the timeout for 'docker inspect' to avoid this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Resolved] (YARN-9851) Make execution type check compatiable
[ https://issues.apache.org/jira/browse/YARN-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang resolved YARN-9851. Resolution: Duplicate > Make execution type check compatiable > - > > Key: YARN-9851 > URL: https://issues.apache.org/jira/browse/YARN-9851 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9851-001.patch > > > During upgrade from 2.6 to 3.1, we encountered a problem: > {code:java} > 2019-09-23,19:29:05,303 WARN > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost > container container_e35_1568719110875_6460_08_01, status: RUNNING, > execution type: null > 2019-09-23,19:29:05,303 WARN > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost > container container_e35_1568886618758_11172_01_62, status: RUNNING, > execution type: null > 2019-09-23,19:29:05,303 WARN > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost > container container_e35_1568886618758_11172_01_63, status: RUNNING, > execution type: null > 2019-09-23,19:29:05,303 WARN > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost > container container_e35_1568886618758_11172_01_64, status: RUNNING, > execution type: null > 2019-09-23,19:29:05,303 WARN > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost > container container_e35_1568886618758_30617_01_06, status: RUNNING, > execution type: null > for (ContainerStatus remoteContainer : containerStatuses) { > if (remoteContainer.getState() == ContainerState.RUNNING > && remoteContainer.getExecutionType() == ExecutionType.GUARANTEED) { > nodeContainers.add(remoteContainer.getContainerId()); > } else { > LOG.warn("Lost container " + remoteContainer.getContainerId() > + ", status: " + remoteContainer.getState() > + ", execution type: " + remoteContainer.getExecutionType()); > } > } > {code} > The cause is that we has nm with version 2.6, which do not have executionType > for container status. > We should check here make the upgrade process more tranparently -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957020#comment-16957020 ] zhoukang commented on YARN-9537: A new patch has been attached [~snemeth] > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: YARN-9537-002.patch > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: (was: YARN-9537-002.patch) > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: YARN-9537-002.patch > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation
[ https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957015#comment-16957015 ] Peter Bacsko commented on YARN-9789: Patch looks straightforward, +1 non-binding. > Disable Option for Write Ahead Logs of LogMutation > -- > > Key: YARN-9789 > URL: https://issues.apache.org/jira/browse/YARN-9789 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9789-001.patch > > > When yarn.scheduler.configuration.store.max-logs is set to zero, the > YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the > backend which is not needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956995#comment-16956995 ] Peter Bacsko edited comment on YARN-9923 at 10/22/19 12:16 PM: --- _"NONE (default): preserving the current behaviour [...]"_ Even the current behaviour can be improved. Right now there are multiple error messages, one after the other. If the binary is missing, there's no need to emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and exit immediately. was (Author: pbacsko): _"NONE (default): preserving the current behaviour [...]"_ Even the current behaviour can be improved. Right now there are multiple error messages, one after the another. If the binary is missing, there's no need to emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and exit immediately. > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956995#comment-16956995 ] Peter Bacsko commented on YARN-9923: _"NONE (default): preserving the current behaviour [...]"_ Even the current behaviour can be improved. Right now there are multiple error messages, one after the another. If the binary is missing, there's no need to emit multiple lines. Simply say "Fatal error: /usr/bin/docker is missing" and exit immediately. > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9689) Router does not support kerberos proxy when in secure mode
[ https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956994#comment-16956994 ] zhoukang commented on YARN-9689: Could you help review this? [~botong][~giovanni.fumarola][~tangzhankun] > Router does not support kerberos proxy when in secure mode > -- > > Key: YARN-9689 > URL: https://issues.apache.org/jira/browse/YARN-9689 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9689.001.patch > > > When we enable kerberos in YARN-Federation mode, we can not get new app since > it will throw kerberos exception below.Which should be handled! > {code:java} > 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 2019-07-22,18:43:25,528 WARN > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: > Unable to create a new ApplicationId in SubCluster xxx > java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed > on local exception: java.io.IOException: javax.security.sasl.SaslException: > GSS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564) > at org.apache.hadoop.ipc.Client.call(Client.java:1506) > at org.apache.hadoop.ipc.Client.call(Client.java:1416) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs
[jira] [Assigned] (YARN-9689) Router does not support kerberos proxy when in secure mode
[ https://issues.apache.org/jira/browse/YARN-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang reassigned YARN-9689: -- Assignee: zhoukang > Router does not support kerberos proxy when in secure mode > -- > > Key: YARN-9689 > URL: https://issues.apache.org/jira/browse/YARN-9689 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9689.001.patch > > > When we enable kerberos in YARN-Federation mode, we can not get new app since > it will throw kerberos exception below.Which should be handled! > {code:java} > 2019-07-22,18:43:25,523 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 2019-07-22,18:43:25,528 WARN > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor: > Unable to create a new ApplicationId in SubCluster xxx > java.io.IOException: DestHost:destPort xxx , LocalHost:localPort xxx. Failed > on local exception: java.io.IOException: javax.security.sasl.SaslException: > GSS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1564) > at org.apache.hadoop.ipc.Client.call(Client.java:1506) > at org.apache.hadoop.ipc.Client.call(Client.java:1416) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy91.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy92.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(FederationClientInterceptor.java:252) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getNewApplication(RouterClientRMService.java:218) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getNewApplication(ApplicationClientProtocolPBServiceImpl.java:263) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:559) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:992) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:885) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:831) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:26
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956962#comment-16956962 ] Bibin Chundatt commented on YARN-9697: -- Thank you [~abmodi] for updating patch Few comments and suggestion # OpportunisticContainerAllocatorAMService -> NodeQueueLoadMonitor init could be moved to AbstractService#serviceinit # NodeQueueLoadMonitor ScheduledExecutorService#scheduledExecutor shutdown not done # NodeQueueLoadMonitor#nodeIdsByRack do we need the NodeIds to be sorted ?? # Thoughts on replacing NodeQueueLoadMonitor#addIntoNodeIdsByRack as follows {code} private void addIntoNodeIdsByRack(RMNode addedNode) { nodeIdsByRack.compute(addedNode.getRackName(), (k, v) -> v == null ? new ConcurrentHashMap().newKeySet() : v).add(addedNode.getNodeID()); } {code} # We could think of replacing NodeQueueLoadMonitor#removeFromNodeIdsByRack too with computeifPresent Not related to patch # OpportunisticSchedulerMetrics shouldn't we be having a destroy() method to reset the counters. During switch over i think we should reset the counters ? > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.ut.patch, > YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956961#comment-16956961 ] zhoukang commented on YARN-9537: Ok i will fix now!thanks [~snemeth] > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956959#comment-16956959 ] zhoukang commented on YARN-9605: [~weichiu][~tangzhankun] Any suggestion?thanks > Add ZkConfiguredFailoverProxyProvider for RM HA > --- > > Key: YARN-9605 > URL: https://issues.apache.org/jira/browse/YARN-9605 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9605.001.patch, YARN-9605.002.patch > > > In this issue, i will track a new feature to support > ZkConfiguredFailoverProxyProvider for RM HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956958#comment-16956958 ] zhoukang commented on YARN-9605: The failed test is below which i think is not related with this patch: {code:java} Stacktrace org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningQuotaException: Integral (avg over time) quota capacity 0.25 over a window of 86400 seconds, would be exceeded by accepting reservation: reservation_6128220156127328780_6678871933820709847 at org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityOverTimePolicy.validate(CapacityOverTimePolicy.java:204) at org.apache.hadoop.yarn.server.resourcemanager.reservation.InMemoryPlan.addReservation(InMemoryPlan.java:348) at org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:141) at org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) {code} > Add ZkConfiguredFailoverProxyProvider for RM HA > --- > > Key: YARN-9605 > URL: https://issues.apache.org/jira/browse/YARN-9605 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9605.001.patch, YARN-9605.002.patch > > > In this issue, i will track a new feature to support > ZkConfiguredFailoverProxyProvider for RM HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9788) Queue Management API - does not support parallel updates
[ https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956957#comment-16956957 ] Peter Bacsko commented on YARN-9788: Thanks for the patch [~Prabhu Joseph]. I think the patch looks good. Really just one nitpick: the test name {{testParallelUpdates}} might be slightly misleading because things are not really happening in parallel. It's more like making sure that updates are not lost. So a better name would be sth like {{testMultipleUpdates}} or {{testMultipleUpdatesNotLost}}, etc. Otherwise +1 non-binding. > Queue Management API - does not support parallel updates > > > Key: YARN-9788 > URL: https://issues.apache.org/jira/browse/YARN-9788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9788-001.patch, YARN-9788-002.patch, > YARN-9788-003.patch, YARN-9788-004.patch, YARN-9788-005.patch, > YARN-9788-006.patch, YARN-9788-007.patch > > > Queue Management API - does not support parallel updates. When there are two > parallel schedule conf updates (logAndApplyMutation), the first update is > overwritten by the second one. > Currently the logAndApplyMutation creates LogMutation and stores it in a > variable pendingMutation. This way at any given time there will be only one > LogMutation. And so the two parallel logAndApplyMutation will override the > pendingMutation and the later one only will be present. > The fix is to return LogMutation object by logAndApplyMutation which can be > passed during confirmMutation. This fixes the parallel updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9916) Improving Async Dispatcher
[ https://issues.apache.org/jira/browse/YARN-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956952#comment-16956952 ] Adam Antal commented on YARN-9916: -- I think this is related (if not the dupe) to YARN-9927. > Improving Async Dispatcher > -- > > Key: YARN-9916 > URL: https://issues.apache.org/jira/browse/YARN-9916 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Major > > Currently, async dispatcher works in the single-threaded model. > > There is another queue for the scheduler handler, but not all handlers are > non-blocking. In our cluster, this queue can go sometimes to 16M events, > which takes time to drain. > > We should think of improving it: > > # Either make multi-threads in the dispatcher which will pick queue events, > but this would require careful evaluation of the order of events. > # Or Make all downstream handlers similar to scheduler queue (this also > needs careful evaluation of out of order events). > Any other ideas are also welcome. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956949#comment-16956949 ] hcarrot commented on YARN-9927: --- The performance bottleneck is the single-thread RMEventDispatcher mode. Events are processed one by one. If we change single-thread to multi-thread, RM can process different events concurrently. > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call
[ https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956941#comment-16956941 ] Peter Bacsko commented on YARN-9780: [~Prabhu Joseph] I have some minor comments: #1 Nit: pay attention the missing white spaces {noformat} String newQueueState = newConf.get(configPrefix+"state"); {noformat} #2 I suggest the following piece of code to retrieve {{newQueueState}} with error handling: {noformat} String configPrefix = newConf.getQueuePrefix( oldQueue.getQueuePath()); try { QueueState newQueueState = QueueState.valueOf( newConf.get(configPrefix + "state")); } catch (IllegalArgumentException) { // handle illegal string for state } // no need to null check newQueueState if (oldQueue.getState() == QueueState.STOPPED || newQueueState != QueueState.STOPPED) { ...{noformat} #3 Nit: add some (or more) meaningful assertion messages: {noformat} assertEquals(1, newCSConf.getQueues("root.a").length); assertEquals("a1", newCSConf.getQueues("root.a")[0]);{noformat} > SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single > call > > > Key: YARN-9780 > URL: https://issues.apache.org/jira/browse/YARN-9780 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9780-001.patch, YARN-9780-002.patch, > YARN-9780-003.patch > > > SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single > call. The queue has to be stopped before removing and so it is useful to > allow both Stop and remove queue in a single call. > *Repro:* > {code:java} > Capacity-Scheduler.xml: > yarn.scheduler.capacity.root.queues = new, default, dummy > yarn.scheduler.capacity.root.default.capacity = 60 > yarn.scheduler.capacity.root.dummy.capacity = 30 > yarn.scheduler.capacity.root.new.capacity = 10 > curl -v -X PUT -d @abc.xml -H "Content-type: application/xml" > 'http://:8088/ws/v1/cluster/scheduler-conf' > abc.xml > > > root.default > > > capacity > 70 > > > > > root.new > > > state > STOPPED > > > > root.new > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956938#comment-16956938 ] Kinga Marton commented on YARN-9886: In the attached patch 001 I have addressed the following issues: * changed the id pattern from {{userid=}} to {{u=}} * added property for enabling this feature * added a property for specifying the users who can do such operation * added unit tests I also wanted to add a small information to the documentation, but I didn't found the proper place for it. I was searching for a part where the scheduler common things are documented. > Queue mapping based on userid passed through application tag > > > Key: YARN-9886 > URL: https://issues.apache.org/jira/browse/YARN-9886 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch > > > There are situations when the real submitting user differs from the user what > arrives to YARN. For example in case of a Hive application when Hive > impersonation is turned off, the hive queries will run as Hive user and the > mapping is done based on this username. Unfortunately in this case YARN > doesn't have any information about the real user and there are cases when the > customer may want to map this applications to the real submitting user's > queue instead of the Hive one. > For this cases if they would pass the username in the application tag we may > read it and use that one during the queue mapping, if that user has rights to > run on the real user's queue. > [~sunilg] please correct me if I missed something. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YARN-9886: --- Attachment: YARN-9886.001.patch > Queue mapping based on userid passed through application tag > > > Key: YARN-9886 > URL: https://issues.apache.org/jira/browse/YARN-9886 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch > > > There are situations when the real submitting user differs from the user what > arrives to YARN. For example in case of a Hive application when Hive > impersonation is turned off, the hive queries will run as Hive user and the > mapping is done based on this username. Unfortunately in this case YARN > doesn't have any information about the real user and there are cases when the > customer may want to map this applications to the real submitting user's > queue instead of the Hive one. > For this cases if they would pass the username in the application tag we may > read it and use that one during the queue mapping, if that user has rights to > run on the real user's queue. > [~sunilg] please correct me if I missed something. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956934#comment-16956934 ] Peter Bacsko commented on YARN-9781: LGTM +1 (non-binding) > SchedConfCli to get current stored scheduler configuration > -- > > Key: YARN-9781 > URL: https://issues.apache.org/jira/browse/YARN-9781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9781-001.patch, YARN-9781-002.patch, > YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch > > > SchedConfCLI currently allows to add / remove / remove queue. It does not > support get configuration which RMWebServices provides as part of YARN-8559. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
[ https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi updated YARN-9928: --- Component/s: ATSv2 > ATSv2 can make NM go down with a FATAL error while it is resyncing with RM > -- > > Key: YARN-9928 > URL: https://issues.apache.org/jira/browse/YARN-9928 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > Encountered the below FATAL error in the NodeManager which was under heavy > load and was also resyncing with RM at the same. This caused the NM to go > down. > {code:java} > 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher > (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
[ https://issues.apache.org/jira/browse/YARN-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi updated YARN-9928: --- Affects Version/s: 3.1.0 > ATSv2 can make NM go down with a FATAL error while it is resyncing with RM > -- > > Key: YARN-9928 > URL: https://issues.apache.org/jira/browse/YARN-9928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > Encountered the below FATAL error in the NodeManager which was under heavy > load and was also resyncing with RM at the same. This caused the NM to go > down. > {code:java} > 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher > (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9928) ATSv2 can make NM go down with a FATAL error while it is resyncing with RM
Tarun Parimi created YARN-9928: -- Summary: ATSv2 can make NM go down with a FATAL error while it is resyncing with RM Key: YARN-9928 URL: https://issues.apache.org/jira/browse/YARN-9928 Project: Hadoop YARN Issue Type: Bug Reporter: Tarun Parimi Assignee: Tarun Parimi Encountered the below FATAL error in the NodeManager which was under heavy load and was also resyncing with RM at the same. This caused the NM to go down. {code:java} 2019-09-18 11:22:44,899 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerCreatedEvent(NMTimelinePublisher.java:216) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerEvent(NMTimelinePublisher.java:383) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1511) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956730#comment-16956730 ] Kinga Marton edited comment on YARN-9886 at 10/22/19 7:41 AM: -- [~wangda] yes. I will add a whitelist, where it can be defined who can use this feature. was (Author: kmarton): [~wangda] yes. I will add whitelist, where it can be defined who can use this feature. > Queue mapping based on userid passed through application tag > > > Key: YARN-9886 > URL: https://issues.apache.org/jira/browse/YARN-9886 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: YARN-9886-WIP.patch > > > There are situations when the real submitting user differs from the user what > arrives to YARN. For example in case of a Hive application when Hive > impersonation is turned off, the hive queries will run as Hive user and the > mapping is done based on this username. Unfortunately in this case YARN > doesn't have any information about the real user and there are cases when the > customer may want to map this applications to the real submitting user's > queue instead of the Hive one. > For this cases if they would pass the username in the application tag we may > read it and use that one during the queue mapping, if that user has rights to > run on the real user's queue. > [~sunilg] please correct me if I missed something. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN
[ https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956777#comment-16956777 ] Zhenyu Zheng commented on YARN-9897: Some updates, our team has succesfully donated ARM resources and setup an ARM CI for Apache Spark: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ it will set to periodic job and then PR trigger when we think it is stable enough. And it includes some basic YARN tests, and it seems OK. I really hope we can do the same for YARN. > Add an Aarch64 CI for YARN > -- > > Key: YARN-9897 > URL: https://issues.apache.org/jira/browse/YARN-9897 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, test >Reporter: Zhenyu Zheng >Priority: Major > Attachments: hadoop_build.log > > > As YARN is the resource manager of Hadoop and there are large number of other > software that also uses YARN for resource management. The capability of > running YARN on platforms with different architecture and managing hardware > resources with different architecture could be very important and useful. > Aarch64(ARM) architecture is currently the dominate architecture in small > devices like phone, IOT devices, security cameras, drones etc. With the > increasing compuiting capability and the increasing connection speed like 5G > network, there could be greate posibility and opportunity for world chaging > inovations and new market if we can managing and make use of those devices as > well. > Currently, all YARN CIs are based on x86 architecture and we have been > performing tests on Aarch64 and proposing possible solutions for problems we > have meet, like: > https://issues.apache.org/jira/browse/HADOOP-16614 > we have done all YARN tests and it turns out there are only a few problems, > and we can provide possible solutions for discussion. > We want to propose to add an Aarch64 CI for YARN to promote the support for > YARN on Aarch64 platforms. We are willing to provide machines to the current > CI system and manpower to mananging the CI and fxing problems that occours. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hcarrot updated YARN-9927: -- Priority: Major (was: Minor) > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hcarrot updated YARN-9927: -- Description: Recently, we have observed serious event blocking in RM event dispatcher queue. After analysis of RM event monitoring data and RM event processing logic, we found that 1) environment: a cluster with thousands of nodes 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler 3) Meanwhile, RM event processing is in a single-thread mode, and It results in the low headroom of RM event scheduler, thus performance of RM. So we proposed a RM multi-thread event processing mechanism to improve RM performance. was:Recently, we have observed serious event blocking in RM event dispatcher queue. After analysis of RM event monitoring data and RM event processing logic, we found that the proportion of RMNodeStatusEvent is less than other events, but the overall processing time of it is more than other events. Meanwhile, RM event processing is in a single-thread mode, and It results in the decrease of RM's performance. So we proposed a RM multi-thread event processing mechanism to improve RM performance. > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Minor > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956764#comment-16956764 ] Adam Antal commented on YARN-9927: -- Thanks for filing this [~hcarrot], interesting approach. One question that came to my mind is that: are you certain the the dispatcher is the real bottleneck here? I mean if an event processing requires holding the lock the whole time, then we just replace the time in the dispatcher queue with lock-holding time for each event. We should dig down that for a certain event type how long the lock should be hold. > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Minor > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that the proportion of RMNodeStatusEvent is less than other > events, but the overall processing time of it is more than other events. > Meanwhile, RM event processing is in a single-thread mode, and It results in > the decrease of RM's performance. So we proposed a RM multi-thread event > processing mechanism to improve RM performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hcarrot updated YARN-9927: -- Affects Version/s: 3.0.0 2.9.2 > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Priority: Minor > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that the proportion of RMNodeStatusEvent is less than other > events, but the overall processing time of it is more than other events. > Meanwhile, RM event processing is in a single-thread mode, and It results in > the decrease of RM's performance. So we proposed a RM multi-thread event > processing mechanism to improve RM performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hcarrot updated YARN-9927: -- Description: Recently, we have observed serious event blocking in RM event dispatcher queue. After analysis of RM event monitoring data and RM event processing logic, we found that the proportion of RMNodeStatusEvent is less than other events, but the overall processing time of it is more than other events. Meanwhile, RM event processing is in a single-thread mode, and It results in the decrease of RM's performance. So we proposed a RM multi-thread event processing mechanism to improve RM performance. (was: Recently, we have observed serious event blocking in RM event dispatcher queue. After analysis of RM event monitoring data and RM event processing logic, we found that the proportion of RMNodeStatusEvent is less than other events, but the overall processing time of it is more than other events. Meanwhile, RM event processing is in a single-thread mode, and It results in the decrease of RM's performance. So we proposed a RM multi-thread event processing mechanism to improve RM performance. Is this mechanism feasible?) > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: hcarrot >Priority: Minor > Attachments: RM multi-thread event processing mechanism.pdf > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that the proportion of RMNodeStatusEvent is less than other > events, but the overall processing time of it is more than other events. > Meanwhile, RM event processing is in a single-thread mode, and It results in > the decrease of RM's performance. So we proposed a RM multi-thread event > processing mechanism to improve RM performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956747#comment-16956747 ] Adam Antal commented on YARN-9511: -- Hi [~seanlau], I can repro the steps you described above with one exception: my default umask on my Mac is 0022, so the test passes by default on JDK8. Also could please confirm that you are using JDK11 (this issue is primarily about the JDK11 related part). If the test failed on JDK8, of course we should fix it, but it passes on my local. I am not really familiar with the umask defaults, but I think this is related to your environment. What machine do you run the tests on? > [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: > The remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Assignee: Szilard Nemeth >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org