[jira] [Comment Edited] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission
[ https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985279#comment-16985279 ] lindongdong edited comment on YARN-9738 at 11/30/19 7:26 AM: - Hi [~BilwaST] , In the last patch, I think it is better to move "null check" out of the readlock. was (Author: lindongdong): [~BilwaST] In the last patch, I think it is better to move "null check" out of the readlock. > Remove lock on ClusterNodeTracker#getNodeReport as it blocks application > submission > --- > > Key: YARN-9738 > URL: https://issues.apache.org/jira/browse/YARN-9738 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9738-001.patch, YARN-9738-002.patch, > YARN-9738-003.patch > > > *Env :* > Server OS :- UBUNTU > No. of Cluster Node:- 9120 NMs > Env Mode:- [Secure / Non secure]Secure > *Preconditions:* > ~9120 NM's was running > ~1250 applications was in running state > 35K applications was in pending state > *Test Steps:* > 1. Submit the application from 5 clients, each client 2 threads and total 10 > queues > 2. Once application submittion increases (for each application of > distributted shell will call getClusterNodes) > *ClientRMservice#getClusterNodes tries to get > ClusterNodeTracker#getNodeReport where map nodes is locked.* > {quote} > "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 > tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f759f6d8858> (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792) > {quote} > *Instead we can make nodes as concurrentHashMap and remove readlock* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission
[ https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985279#comment-16985279 ] lindongdong commented on YARN-9738: --- [~BilwaST] In the last patch, I think it is better to move "null check" out of the readlock. > Remove lock on ClusterNodeTracker#getNodeReport as it blocks application > submission > --- > > Key: YARN-9738 > URL: https://issues.apache.org/jira/browse/YARN-9738 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9738-001.patch, YARN-9738-002.patch, > YARN-9738-003.patch > > > *Env :* > Server OS :- UBUNTU > No. of Cluster Node:- 9120 NMs > Env Mode:- [Secure / Non secure]Secure > *Preconditions:* > ~9120 NM's was running > ~1250 applications was in running state > 35K applications was in pending state > *Test Steps:* > 1. Submit the application from 5 clients, each client 2 threads and total 10 > queues > 2. Once application submittion increases (for each application of > distributted shell will call getClusterNodes) > *ClientRMservice#getClusterNodes tries to get > ClusterNodeTracker#getNodeReport where map nodes is locked.* > {quote} > "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 > tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f759f6d8858> (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792) > {quote} > *Instead we can make nodes as concurrentHashMap and remove readlock* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission
[ https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lindongdong updated YARN-9738: -- Description: *Env :* Server OS :- UBUNTU No. of Cluster Node:- 9120 NMs Env Mode:- [Secure / Non secure]Secure *Preconditions:* ~9120 NM's was running ~1250 applications was in running state 35K applications was in pending state *Test Steps:* 1. Submit the application from 5 clients, each client 2 threads and total 10 queues 2. Once application submittion increases (for each application of distributted shell will call getClusterNodes) *ClientRMservice#getClusterNodes tries to get ClusterNodeTracker#getNodeReport where map nodes is locked.* {quote}"IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7f759f6d8858> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792){quote} *Instead we can make nodes as concurrentHashMap and remove readlock* was: *Env :* Server OS :- UBUNTU No. of Cluster Node:- 9120 NMs Env Mode:- [Secure / Non secure]Secure *Preconditions:* ~9120 NM's was running ~1250 applications was in running state 35K applications was in pending state *Test Steps:* 1. Submit the application from 5 clients, each client 2 threads and total 10 queues 2. Once application submittion increases (for each application of distributted shell will call getClusterNodes) *ClientRMservice#getClusterNodes tries to get ClusterNodeTracker#getNodeReport where map nodes is locked.* {quote} "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7f759f6d8858> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992) at
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985179#comment-16985179 ] Hadoop QA commented on YARN-5106: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 29 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 278 unchanged - 52 fixed = 279 total (was 330) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 59s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}211m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-5106 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987181/YARN-5106.016.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 42aee859937e 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a2dadac | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985120#comment-16985120 ] Prabhu Joseph commented on YARN-9925: - [~pbacsko] Have reported YARN-10006 to handle the same. Thanks. > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9925-001.patch, YARN-9925-002.patch, > YARN-9925-003.patch, YARN-9925-004.patch, YARN-9925-005.patch > > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) > ... 70 more > {code} > In Some cases, the error is not thrown while creating the queue but thrown at > submission of job "Failed to submit application_1571677375269_0002 to YARN : > Application application_1571677375269_0002 submitted by user : systest to > non-leaf queue : B" > Below scenarios are allowed but it should not > {code:java} > It allows root.A.A1.B when root.B.B1 already exists. > > 1. Add root.A > 2. Add root.A.A1 > 3. Add root.B > 4. Add root.B.B1 > 5. Allows Add of root.A.A1.B > It allows two root queues: > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Allows Add of root.A.A1.root > > {code} > Below scenario is handled properly: > {code:java} > It does not allow root.B.A when root.A.A1 already exists. > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Does not Allow Add of root.B.A > {code} > This error handling has to be consistent in all scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10006) IOException used in place of YARNException in CapaitySheduler
Prabhu Joseph created YARN-10006: Summary: IOException used in place of YARNException in CapaitySheduler Key: YARN-10006 URL: https://issues.apache.org/jira/browse/YARN-10006 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.3.0 Reporter: Prabhu Joseph IOException used in place of YARNException in CapaityScheduler. As per YARNException Doc, {code:java} /** * YarnException indicates exceptions from yarn servers. On the other hand, * IOExceptions indicates exceptions from RPC layer. */ {code} Below methods throws IOException but it is suppose to throw YarnException. CapaityShedulerQueueManager#parseQueue <- initializeQueues <- CapacityScheduler#initializeQueues <- initScheduler <- serviceInit -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-5106: - Attachment: YARN-5106.016.patch > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch, YARN-5106.016.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985085#comment-16985085 ] Adam Antal commented on YARN-5106: -- Thanks for the review [~pbacsko]! Done with all. I put an enum to the placement rules, I hope it aligns with your intention. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command
[ https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985060#comment-16985060 ] Hadoop QA commented on YARN-9985: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 90 unchanged - 2 fixed = 90 total (was 92) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 56s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9985 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987158/YARN-9985-02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux
[jira] [Commented] (YARN-9877) Intermittent TIME_OUT of LogAggregationReport
[ https://issues.apache.org/jira/browse/YARN-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985055#comment-16985055 ] Peter Bacsko commented on YARN-9877: [~adam.antal] is there are way to test this? {{TestRMAppTransitions}} is the place where the validation could take place. > Intermittent TIME_OUT of LogAggregationReport > - > > Key: YARN-9877 > URL: https://issues.apache.org/jira/browse/YARN-9877 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, resourcemanager, yarn >Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9877.001.patch > > > I noticed some intermittent TIME_OUT in some downstream log-aggregation based > tests. > Steps to reproduce: > - Let's run a MR job > {code} > hadoop jar hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep > -Dmapreduce.job.queuename=root.default -m 10 -r 10 -mt 5000 -rt 5000 > {code} > - Suppose the AM is requesting more containers, but as soon as they're > allocated - the AM realizes it doesn't need them. The container's state > changes are: ALLOCATED -> ACQUIRED -> RELEASED. > Let's suppose these extra containers are allocated in a different node from > the other 21 (AM + 10 mapper + 10 reducer) containers' node. > - All the containers finish successfully and the app is finished successfully > as well. Log aggregation status for the whole app seemingly stucks in RUNNING > state. > - After a while the final log aggregation status for the app changes to > TIME_OUT. > Root cause: > - As unused containers are getting through the state transition in the RM's > internal representation, {{RMAppImpl$AppRunningOnNodeTransition}}'s > transition function is called. This calls the > {{RMAppLogAggregation$addReportIfNecessary}} which forcefully adds the > "NOT_START" LogAggregationStatus associated with this NodeId for the app, > even though it does not have any running container on it. > - The node's LogAggregationStatus is never updated to "SUCCEEDED" by the > NodeManager because it does not have any running container on it (Note that > the AM immediately released them after acquisition). The LogAggregationStatus > remains NOT_START until time out is reached. After that point the RM > aggregates the LogAggregationReports for all the nodes, and though all the > containers have SUCCEEDED state, one particular node has NOT_START, so the > final log aggregation will be TIME_OUT. > (I crawled the RM UI for the log aggregation statuses, and it was always > NOT_START for this particular node). > This situation is highly unlikely, but has an estimated ~0.8% of failure rate > based on a year's 1500 run on an unstressed cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985033#comment-16985033 ] Peter Bacsko edited comment on YARN-5106 at 11/29/19 1:51 PM: -- Thanks for picking up the patch [~adam.antal]. * Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either: ## Extract them to constants somewhere ## instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. * {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. * Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended above regarding "drf" and "fair". was (Author: pbacsko): Thanks for picking up the patch [~adam.antal]. * Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either: ## Extract them to constants somewhere ## instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. * {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. * Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985033#comment-16985033 ] Peter Bacsko edited comment on YARN-5106 at 11/29/19 1:50 PM: -- Thanks for picking up the patch [~adam.antal]. * Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either: ## Extract them to constants somewhere ## instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. * {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. * Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. was (Author: pbacsko): Thanks for picking up the patch [~adam.antal]. Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either # Extract them to constants somewhere # instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. # {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. # Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985033#comment-16985033 ] Peter Bacsko edited comment on YARN-5106 at 11/29/19 1:50 PM: -- Thanks for picking up the patch [~adam.antal]. * Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either: ## Extract them to constants somewhere ## instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. * {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. * Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. was (Author: pbacsko): Thanks for picking up the patch [~adam.antal]. * Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either: ## Extract them to constants somewhere ## instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. * {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. * Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985033#comment-16985033 ] Peter Bacsko commented on YARN-5106: Thanks for picking up the patch [~adam.antal]. Just minor a thing: strings like "drf" and "fair" are often repeated quite often. Either # Extract them to constants somewhere # instead of using {{.defaultQueueSchedulingPolicy("drf")}} or {{.schedulingPolicy("drf")}}, you can introduce methods like {{.drfDefaultSchedulingPolicy()}} and {{.drfSchedulingPolicy()}}, {{.fairSchedulingPolicy()}}, etc. # {{.queueMaxAMShareDefault(-1.0f)}}: here, the value -1.0f has a special meaning (feature is disabled). Again, extract {{-1.0f}} or add a method {{.disableQueueMaxAmShareDefault()}}. # Placement rules like "specified", "reject", etc. Similarly to scheduling policy, the set of accepted values are fixed. So do something which I recommended in #2. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder
[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985027#comment-16985027 ] Szilard Nemeth commented on YARN-9052: -- OK, unit test failure was intermittent. [~sunilg] Please review the latest patch! > Replace all MockRM submit method definitions with a builder > --- > > Key: YARN-9052 > URL: https://issues.apache.org/jira/browse/YARN-9052 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: > YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs003-justfailed.txt, > YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, > YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, > YARN-9052.004.withlogs.patch, YARN-9052.005.patch, YARN-9052.006.patch, > YARN-9052.007.patch, YARN-9052.008.patch, YARN-9052.009.patch, > YARN-9052.009.patch, YARN-9052.testlogs.002.patch, > YARN-9052.testlogs.002.patch, YARN-9052.testlogs.003.patch, > YARN-9052.testlogs.patch > > > MockRM has 31 definitions of submitApp, most of them having more than > acceptable number of parameters, ranging from 2 to even 22 parameters, which > makes the code completely unreadable. > On top of unreadability, it's very hard to follow what RmApp will be produced > for tests as they often pass a lot of empty / null values as parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command
[ https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984999#comment-16984999 ] Ayush Saxena commented on YARN-9985: Found Failover too mentioned in two places along with transitionToObserver, Removed this as that too wasn't supposed to be there as per YARN-3397 > Unsupported "transitionToObserver" option displaying for rmadmin command > > > Key: YARN-9985 > URL: https://issues.apache.org/jira/browse/YARN-9985 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, yarn >Affects Versions: 3.2.1 >Reporter: Souryakanta Dwivedy >Assignee: Ayush Saxena >Priority: Minor > Attachments: YARN-9985-01.patch, YARN-9985-02.patch, > image-2019-11-18-18-31-17-755.png, image-2019-11-18-18-35-54-688.png > > > Unsupported "transitionToObserver" option displaying for rmadmin command > Check the options for Yarn rmadmin command > It will display the "-transitionToObserver " option which is not > supported > by yarn rmadmin command which is wrong behavior. > But if you check the yarn rmadmin -help it will not display any option > "-transitionToObserver " > > !image-2019-11-18-18-31-17-755.png! > > == > install/hadoop/resourcemanager/bin> ./yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in > seconds] -client|server]] [-refreshNodesResources] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">] > [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode > <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] > [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] > [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) or > -updateNodeResource [NodeID] [ResourceTypes] ([OvercommitTimeout])] > *{color:#FF}[-transitionToActive [--forceactive] ]{color} > {color:#FF}[-transitionToStandby ]{color}* [-getServiceState > ] [-getAllServiceState] [-checkHealth ] [-help [cmd]] > -refreshQueues: Reload the queues' acls, states and scheduler specific > properties. > ResourceManager will reload the mapred-queues configuration file. > -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the > hosts information at the ResourceManager. Here [-g|graceful [timeout in > seconds] -client|server] is optional, if we specify the timeout then > ResourceManager will wait for timeout before marking the NodeManager as > decommissioned. The -client|server indicates if the timeout tracking should > be handled by the client or the ResourceManager. The client-side tracking is > blocking, while the server-side tracking is not. Omitting the timeout, or a > timeout of -1, indicates an infinite timeout. Known Issue: the server-side > tracking will immediately decommission if an RM HA failover occurs. > -refreshNodesResources: Refresh resources of NodeManagers at the > ResourceManager. > -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings > -refreshUserToGroupsMappings: Refresh user-to-groups mappings > -refreshAdminAcls: Refresh acls for administration of ResourceManager > -refreshServiceAcl: Reload the service-level authorization policy file. > ResourceManager will reload the authorization policy file. > -getGroups [username]: Get the groups which given user belongs to. > -addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster > node labels. Default exclusivity is true > -removeFromClusterNodeLabels (label splitted by ","): > remove from cluster node labels > -replaceLabelsOnNode <"node1[:port]=label1,label2 > node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes > (please note that we do not support specifying multiple labels on a single > host for now.) > [-failOnUnknownNodes] is optional, when we set this option, it will fail if > specified nodes are unknown. > -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future > releases. Directly access node label store, with this option, all node label > related operations will not connect RM. Instead, they will access/modify > stored node labels directly. By default, it is false (access via RM). AND > PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir to a local > directory (instead of NFS or HDFS), this option will only work when the > command run on the machine where RM is running. > -refreshClusterMaxPriority: Refresh cluster max priority >
[jira] [Updated] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command
[ https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated YARN-9985: --- Attachment: YARN-9985-02.patch > Unsupported "transitionToObserver" option displaying for rmadmin command > > > Key: YARN-9985 > URL: https://issues.apache.org/jira/browse/YARN-9985 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, yarn >Affects Versions: 3.2.1 >Reporter: Souryakanta Dwivedy >Assignee: Ayush Saxena >Priority: Minor > Attachments: YARN-9985-01.patch, YARN-9985-02.patch, > image-2019-11-18-18-31-17-755.png, image-2019-11-18-18-35-54-688.png > > > Unsupported "transitionToObserver" option displaying for rmadmin command > Check the options for Yarn rmadmin command > It will display the "-transitionToObserver " option which is not > supported > by yarn rmadmin command which is wrong behavior. > But if you check the yarn rmadmin -help it will not display any option > "-transitionToObserver " > > !image-2019-11-18-18-31-17-755.png! > > == > install/hadoop/resourcemanager/bin> ./yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in > seconds] -client|server]] [-refreshNodesResources] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">] > [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode > <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] > [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] > [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) or > -updateNodeResource [NodeID] [ResourceTypes] ([OvercommitTimeout])] > *{color:#FF}[-transitionToActive [--forceactive] ]{color} > {color:#FF}[-transitionToStandby ]{color}* [-getServiceState > ] [-getAllServiceState] [-checkHealth ] [-help [cmd]] > -refreshQueues: Reload the queues' acls, states and scheduler specific > properties. > ResourceManager will reload the mapred-queues configuration file. > -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the > hosts information at the ResourceManager. Here [-g|graceful [timeout in > seconds] -client|server] is optional, if we specify the timeout then > ResourceManager will wait for timeout before marking the NodeManager as > decommissioned. The -client|server indicates if the timeout tracking should > be handled by the client or the ResourceManager. The client-side tracking is > blocking, while the server-side tracking is not. Omitting the timeout, or a > timeout of -1, indicates an infinite timeout. Known Issue: the server-side > tracking will immediately decommission if an RM HA failover occurs. > -refreshNodesResources: Refresh resources of NodeManagers at the > ResourceManager. > -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings > -refreshUserToGroupsMappings: Refresh user-to-groups mappings > -refreshAdminAcls: Refresh acls for administration of ResourceManager > -refreshServiceAcl: Reload the service-level authorization policy file. > ResourceManager will reload the authorization policy file. > -getGroups [username]: Get the groups which given user belongs to. > -addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster > node labels. Default exclusivity is true > -removeFromClusterNodeLabels (label splitted by ","): > remove from cluster node labels > -replaceLabelsOnNode <"node1[:port]=label1,label2 > node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes > (please note that we do not support specifying multiple labels on a single > host for now.) > [-failOnUnknownNodes] is optional, when we set this option, it will fail if > specified nodes are unknown. > -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future > releases. Directly access node label store, with this option, all node label > related operations will not connect RM. Instead, they will access/modify > stored node labels directly. By default, it is false (access via RM). AND > PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir to a local > directory (instead of NFS or HDFS), this option will only work when the > command run on the machine where RM is running. > -refreshClusterMaxPriority: Refresh cluster max priority > -updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) > or > [NodeID] [resourcetypes] ([OvercommitTimeout]). : Update resource on >
[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder
[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984991#comment-16984991 ] Hadoop QA commented on YARN-9052: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 88 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 36s{color} | {color:orange} root: The patch generated 42 new + 1829 unchanged - 58 fixed = 1871 total (was 1887) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 47s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 9s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 1s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}237m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9052 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987130/YARN-9052.009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 213e8e3019a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984948#comment-16984948 ] Hadoop QA commented on YARN-5106: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 1s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 29 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 278 unchanged - 52 fixed = 278 total (was 330) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 25s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 23s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}197m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-5106 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987127/YARN-5106.015.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c30178192070 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a2dadac | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results |
[jira] [Commented] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984905#comment-16984905 ] Hadoop QA commented on YARN-9923: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 26 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 38s{color} | {color:green} root generated 0 new + 1868 unchanged - 2 fixed = 1868 total (was 1870) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 55s{color} | {color:orange} root: The patch generated 2 new + 596 unchanged - 52 fixed = 598 total (was 648) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 18s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 52s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 36s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | |
[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984890#comment-16984890 ] Peter Bacsko commented on YARN-9925: [~prabhujoseph] could you create a follow-up Jira about changing {{IOException}} to {{YarnException}}? > CapacitySchedulerQueueManager allows unsupported Queue hierarchy > > > Key: YARN-9925 > URL: https://issues.apache.org/jira/browse/YARN-9925 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9925-001.patch, YARN-9925-002.patch, > YARN-9925-003.patch, YARN-9925-004.patch, YARN-9925-005.patch > > > CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When > creating a queue with same name as an existing parent queue name - it has to > fail with below. > {code:java} > Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after > refresh, which is not allowed.Caused by: java.io.IOException: A is moved > from:root.A to:root.B.A after refresh, which is not allowed. at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473) > ... 70 more > {code} > In Some cases, the error is not thrown while creating the queue but thrown at > submission of job "Failed to submit application_1571677375269_0002 to YARN : > Application application_1571677375269_0002 submitted by user : systest to > non-leaf queue : B" > Below scenarios are allowed but it should not > {code:java} > It allows root.A.A1.B when root.B.B1 already exists. > > 1. Add root.A > 2. Add root.A.A1 > 3. Add root.B > 4. Add root.B.B1 > 5. Allows Add of root.A.A1.B > It allows two root queues: > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Allows Add of root.A.A1.root > > {code} > Below scenario is handled properly: > {code:java} > It does not allow root.B.A when root.A.A1 already exists. > > 1. Add root.A > 2. Add root.B > 3. Add root.A.A1 > 4. Does not Allow Add of root.B.A > {code} > This error handling has to be consistent in all scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9938) Validate Parent Queue for QueueMapping contains dynamic group as parent queue
[ https://issues.apache.org/jira/browse/YARN-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984888#comment-16984888 ] Peter Bacsko commented on YARN-9938: +1 (non-binding) [~snemeth] please review > Validate Parent Queue for QueueMapping contains dynamic group as parent queue > - > > Key: YARN-9938 > URL: https://issues.apache.org/jira/browse/YARN-9938 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9938.001.patch, YARN-9938.002.patch, > YARN-9938.003.patch, YARN-9938.004.patch, YARN-9938.005.patch, > YARN-9938.006.patch > > > Currently \{{UserGroupMappingPlacementRule#validateParentQueue}} validates > the parent queue using queue path. With dynamic group using %primary_group > and %secondary_group in place (Refer YARN-9841 and YARN-9865) , parent queue > validation should also happen for these above 2 queue mappings after > resolving the above wildcard pattern to corresponding groups at runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder
[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984816#comment-16984816 ] Szilard Nemeth commented on YARN-9052: -- Re-attaching patch to see if test failure is intermittent or not. > Replace all MockRM submit method definitions with a builder > --- > > Key: YARN-9052 > URL: https://issues.apache.org/jira/browse/YARN-9052 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: > YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs003-justfailed.txt, > YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, > YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, > YARN-9052.004.withlogs.patch, YARN-9052.005.patch, YARN-9052.006.patch, > YARN-9052.007.patch, YARN-9052.008.patch, YARN-9052.009.patch, > YARN-9052.009.patch, YARN-9052.testlogs.002.patch, > YARN-9052.testlogs.002.patch, YARN-9052.testlogs.003.patch, > YARN-9052.testlogs.patch > > > MockRM has 31 definitions of submitApp, most of them having more than > acceptable number of parameters, ranging from 2 to even 22 parameters, which > makes the code completely unreadable. > On top of unreadability, it's very hard to follow what RmApp will be produced > for tests as they often pass a lot of empty / null values as parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9052) Replace all MockRM submit method definitions with a builder
[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9052: - Attachment: YARN-9052.009.patch > Replace all MockRM submit method definitions with a builder > --- > > Key: YARN-9052 > URL: https://issues.apache.org/jira/browse/YARN-9052 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: > YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs003-justfailed.txt, > YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, > YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, > YARN-9052.004.withlogs.patch, YARN-9052.005.patch, YARN-9052.006.patch, > YARN-9052.007.patch, YARN-9052.008.patch, YARN-9052.009.patch, > YARN-9052.009.patch, YARN-9052.testlogs.002.patch, > YARN-9052.testlogs.002.patch, YARN-9052.testlogs.003.patch, > YARN-9052.testlogs.patch > > > MockRM has 31 definitions of submitApp, most of them having more than > acceptable number of parameters, ranging from 2 to even 22 parameters, which > makes the code completely unreadable. > On top of unreadability, it's very hard to follow what RmApp will be produced > for tests as they often pass a lot of empty / null values as parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-5106: - Attachment: YARN-5106.015.patch > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch, YARN-5106.015.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9990) Testcase fails with "Insufficient configured threads: required=16 < max=10"
[ https://issues.apache.org/jira/browse/YARN-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984810#comment-16984810 ] Prabhu Joseph commented on YARN-9990: - Thanks [~abmodi]. > Testcase fails with "Insufficient configured threads: required=16 < max=10" > --- > > Key: YARN-9990 > URL: https://issues.apache.org/jira/browse/YARN-9990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9990-001.patch > > > Testcase fails with "Insufficient configured threads: required=16 < max=10". > Below testcases failing > 1. TestWebAppProxyServlet > 2. TestAmFilter > 3. TestApiServiceClient > 4. TestSecureApiServiceClient > {code} > [ERROR] org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet Time > elapsed: 0.396 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp1597249648]@5f341870{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@4c762604{s=0/1,p=0}] > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:156) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:130) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:182) > at > org.eclipse.jetty.io.SelectorManager.doStart(SelectorManager.java:255) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:283) > at > org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81) > at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at org.eclipse.jetty.server.Server.doStart(Server.java:385) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.start(TestWebAppProxyServlet.java:102) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > [INFO] Running org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.326 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] > testFindRedirectUrl(org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter) > Time elapsed: 0.306 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp485041780]@1ce92674{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@31f924f5{s=0/1,p=0}] > at >
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984811#comment-16984811 ] Adam Antal commented on YARN-5106: -- Fixed typo causes test failure and last checkstyle issues in v15. Please review. > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Adam Antal >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, > YARN-5106.014.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9923: - Attachment: YARN-9923.008.patch > Introduce HealthReporter interface to support multiple health checker files > --- > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch, YARN-9923.002.patch, > YARN-9923.003.patch, YARN-9923.004.patch, YARN-9923.005.patch, > YARN-9923.006.patch, YARN-9923.007.patch, YARN-9923.008.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. > > A new interface called {{HealthChecker}} is introduced which is used in the > {{NodeHealthCheckerService}}. Currently existing implementations like > {{LocalDirsHandlerService}} are modified to implement this giving a clear > abstraction to the node's health. The {{DockerHealthChecker}} implements this > new interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9923: - Summary: Introduce HealthReporter interface to support multiple health checker files (was: Introduce HealthReporter interface and implement running Docker daemon checker) > Introduce HealthReporter interface to support multiple health checker files > --- > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch, YARN-9923.002.patch, > YARN-9923.003.patch, YARN-9923.004.patch, YARN-9923.005.patch, > YARN-9923.006.patch, YARN-9923.007.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. > > A new interface called {{HealthChecker}} is introduced which is used in the > {{NodeHealthCheckerService}}. Currently existing implementations like > {{LocalDirsHandlerService}} are modified to implement this giving a clear > abstraction to the node's health. The {{DockerHealthChecker}} implements this > new interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9923) Introduce HealthReporter interface and implement running Docker daemon checker
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984802#comment-16984802 ] Adam Antal commented on YARN-9923: -- Resolved last test failure and checkstyle. Will not make final class out of {{NodeHealthScriptRunner}}, because it is mocked in {{TestNodeHealthCheckerService#testNodeHealthService()}}. Please review. > Introduce HealthReporter interface and implement running Docker daemon checker > -- > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch, YARN-9923.002.patch, > YARN-9923.003.patch, YARN-9923.004.patch, YARN-9923.005.patch, > YARN-9923.006.patch, YARN-9923.007.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. > > A new interface called {{HealthChecker}} is introduced which is used in the > {{NodeHealthCheckerService}}. Currently existing implementations like > {{LocalDirsHandlerService}} are modified to implement this giving a clear > abstraction to the node's health. The {{DockerHealthChecker}} implements this > new interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org