[jira] [Created] (YARN-11129) Purge canceled RM DelegationTokenRenewal TimerTasks from the scheduler
Jonathan Turner Eagles created YARN-11129: - Summary: Purge canceled RM DelegationTokenRenewal TimerTasks from the scheduler Key: YARN-11129 URL: https://issues.apache.org/jira/browse/YARN-11129 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Turner Eagles Attachments: Screen Shot 2022-04-27 at 4.20.04 PM.png When yarn.resourcemanager.delegation-token.always-cancel=true, DelegationToken are canceled, but are not able to be garbage collected since they are held until the original expiry. This Jira will evaluate purging canceled delegation tokens possible amortizing the cost by doing this once a threshold is exceeded !Screen Shot 2022-04-27 at 4.20.04 PM.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class
[ https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Turner Eagles updated YARN-6: -- Issue Type: Improvement (was: Bug) > Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter > class > --- > > Key: YARN-6 > URL: https://issues.apache.org/jira/browse/YARN-6 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: pull-request-available > Attachments: YARN-6.001.perftest.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Came across a stack trace with SimpleDateFormatter in it which led me to > investigate current practices > > {noformat} > 6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 > tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000] > 6579 java.lang.Thread.State: RUNNABLE > 6580 at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95) > 6581 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810) > 6582 at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396) > 6583 at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224) > 6584 at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529) > 6585 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > 6586 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500) > 6587 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069) > 6588 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) > 6589 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936) > 6590 at java.security.AccessController.doPrivileged(Native Method) > 6591 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135) > 6592 at > org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123) > 6593 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875) > 6594 > {noformat} > > DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread > local as they can be reused safely across threads. In addition, the new > classes are slightly more performant. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class
[ https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Turner Eagles updated YARN-6: -- Priority: Minor (was: Major) > Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter > class > --- > > Key: YARN-6 > URL: https://issues.apache.org/jira/browse/YARN-6 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Minor > Labels: pull-request-available > Attachments: YARN-6.001.perftest.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Came across a stack trace with SimpleDateFormatter in it which led me to > investigate current practices > > {noformat} > 6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 > tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000] > 6579 java.lang.Thread.State: RUNNABLE > 6580 at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95) > 6581 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810) > 6582 at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396) > 6583 at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224) > 6584 at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529) > 6585 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > 6586 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500) > 6587 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069) > 6588 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) > 6589 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936) > 6590 at java.security.AccessController.doPrivileged(Native Method) > 6591 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135) > 6592 at > org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123) > 6593 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875) > 6594 > {noformat} > > DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread > local as they can be reused safely across threads. In addition, the new > classes are slightly more performant. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class
[ https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Turner Eagles updated YARN-6: -- Attachment: YARN-6.001.perftest.patch > Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter > class > --- > > Key: YARN-6 > URL: https://issues.apache.org/jira/browse/YARN-6 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Attachments: YARN-6.001.perftest.patch > > > Came across a stack trace with SimpleDateFormatter in it which led me to > investigate current practices > > {noformat} > 6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 > tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000] > 6579 java.lang.Thread.State: RUNNABLE > 6580 at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95) > 6581 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810) > 6582 at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396) > 6583 at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224) > 6584 at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529) > 6585 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > 6586 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500) > 6587 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069) > 6588 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) > 6589 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936) > 6590 at java.security.AccessController.doPrivileged(Native Method) > 6591 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135) > 6592 at > org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123) > 6593 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875) > 6594 > {noformat} > > DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread > local as they can be reused safely across threads. In addition, the new > classes are slightly more performant. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class
Jonathan Turner Eagles created YARN-6: - Summary: Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class Key: YARN-6 URL: https://issues.apache.org/jira/browse/YARN-6 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Turner Eagles Assignee: Jonathan Turner Eagles Came across a stack trace with SimpleDateFormatter in it which led me to investigate current practices {noformat} 6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000] 6579 java.lang.Thread.State: RUNNABLE 6580 at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95) 6581 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810) 6582 at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396) 6583 at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224) 6584 at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529) 6585 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) 6586 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500) 6587 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069) 6588 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) 6589 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936) 6590 at java.security.AccessController.doPrivileged(Native Method) 6591 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135) 6592 at org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123) 6593 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875) 6594 {noformat} DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread local as they can be reused safely across threads. In addition, the new classes are slightly more performant. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11096) Support node load based scheduling
[ https://issues.apache.org/jira/browse/YARN-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511486#comment-17511486 ] Jonathan Turner Eagles commented on YARN-11096: --- I don't have the expertise to review, but I see this is implemented in Fair Scheduler and not Capacity Scheduler. Is this a design limitation? > Support node load based scheduling > -- > > Key: YARN-11096 > URL: https://issues.apache.org/jira/browse/YARN-11096 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Deegue >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > ResourceManager can scheduled according to the node load reported by > NodeManager through heartbeat. > > We can set up threshold and auto skip the nodes with high load when > scheduling. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE
[ https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401400#comment-17401400 ] Jonathan Turner Eagles commented on YARN-9744: -- Cherry-picked this to branch-3.2 and branch-2.10 (where I also experienced this bug). Thanks for this fix [~prabhujoseph] and review [~abmodi]! > RollingLevelDBTimelineStore.getEntityByTime fails with NPE > -- > > Key: YARN-9744 > URL: https://issues.apache.org/jira/browse/YARN-9744 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 2.10.2, 3.2.4 > > Attachments: YARN-9744-001.patch > > > RollingLevelDBTimelineStore.getEntityByTime fails with NPE. > {code} > 2019-08-07 12:58:55,990 WARN ipc.Server (Server.java:logException(2433)) - > IPC Server handler 0 on 10200, call > org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from > 10.21.216.93:36392 Call#29446915 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786) > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) > at > org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > {code} > This affects Rest Api to get entities. > curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE
[ https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Turner Eagles updated YARN-9744: - Fix Version/s: 3.2.4 2.10.2 > RollingLevelDBTimelineStore.getEntityByTime fails with NPE > -- > > Key: YARN-9744 > URL: https://issues.apache.org/jira/browse/YARN-9744 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 2.10.2, 3.2.4 > > Attachments: YARN-9744-001.patch > > > RollingLevelDBTimelineStore.getEntityByTime fails with NPE. > {code} > 2019-08-07 12:58:55,990 WARN ipc.Server (Server.java:logException(2433)) - > IPC Server handler 0 on 10200, call > org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from > 10.21.216.93:36392 Call#29446915 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786) > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) > at > org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > {code} > This affects Rest Api to get entities. > curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101026#comment-17101026 ] Jonathan Turner Eagles commented on YARN-8959: -- +1. Thanks, [~ahussein]. Looks great! Feel free to ping me on the follow-up jira. > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959-branch-2.10.004.patch, > YARN-8959.001.patch, YARN-8959.002.patch, YARN-8959.003.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100138#comment-17100138 ] Jonathan Turner Eagles commented on YARN-8959: -- [~ahussein], thanks for taking up this issue. It's always nice to improve the stability of tests. I see you created a specialized waitFor condition to handler this scenario. Your description and analysis are very helpful in understanding this test failure. I can see that the dispatcher will await if it tries to take and the queue (LinkedBlockingQueue in this case) is empty, possibly returning early. However, in addition to the waitForThreadToWait condition, there is also DrainDispatcher.await(). This wait condition synchronously handles events and await will return when the event has 1) been taken from the queue and 2) been handled. This wait condition is the most popular in the code base, and seems sufficient to handle this condition. Could waitForThreadToWait() be switched to await(). I think it will be easiest to understand as there is already a large precedent and readers of the code will be familiar. It will also be less specialized code to maintain in the code base. Perhaps, I have missed something and await is insufficient. Let me know. > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Commented] (YARN-10256) Refactor TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
[ https://issues.apache.org/jira/browse/YARN-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097420#comment-17097420 ] Jonathan Turner Eagles commented on YARN-10256: --- +1. I'll commit this today. > Refactor > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > --- > > Key: YARN-10256 > URL: https://issues.apache.org/jira/browse/YARN-10256 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: refactoring, unit-test > Attachments: YARN-10256.001.patch > > > In 3.x, > {{TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic}} > has redundant assertions. Since the UT throws timeout exception, > {{GenericTestsUtils.waitFor()}} guarantees that the predicate is met > successfully. Otherwise, the UT would throw a timeout exception. > The redundant loop causes confusion in understanding the test unit and may > increase the possibility of failure in case the container terminates -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10255) revisit fix to intermittent TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
[ https://issues.apache.org/jira/browse/YARN-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096903#comment-17096903 ] Jonathan Turner Eagles commented on YARN-10255: --- Ok. So this jira is really backport YARN-7372 to branch-2.10 and cleanup testContainerUpdateExecTypeGuaranteedToOpportunistic. After reading the description and you comment that is more clear now. Usually the way this is done is to file separate issues as there are two things being accomplished. This makes reviewers happy :) and makes branch managers (people maintaining release lines either community or internal) much easier so they know better what commits to pull in to their line. I would expect to see something like this. - Backport YARN-7372 to branch-2.10 - Refactor TestContainerSchedulerQueuing .testContainerUpdateExecTypeGuaranteedToOpportunistic > revisit fix to intermittent > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > -- > > Key: YARN-10255 > URL: https://issues.apache.org/jira/browse/YARN-10255 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: unit-test > Attachments: YARN-10255-branch-2.10.001.patch, YARN-10255.001.patch > > > Creating this Jira to fix intermittent failure in branch-2.10. Also, the fix > in YARN-7372 has some redundancy in assertion that could be removed. > UT failure in branch-2.10: > {noformat} > testContainerUpdateExecTypeGuaranteedToOpportunistic: > message='expected:OPPORTUNISTIC but >
[jira] [Commented] (YARN-10255) revisit fix to intermittent TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
[ https://issues.apache.org/jira/browse/YARN-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096872#comment-17096872 ] Jonathan Turner Eagles commented on YARN-10255: --- I see a 4k patch, which is pretty substantial, but don't see an analysis that shows the need for these changes. Is the effective change besides the refactoring (lambda) just to change the delay on the scheduler? If not please help explain the changes along with the analysis. > revisit fix to intermittent > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > -- > > Key: YARN-10255 > URL: https://issues.apache.org/jira/browse/YARN-10255 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: unit-test > Attachments: YARN-10255-branch-2.10.001.patch, YARN-10255.001.patch > > > Creating this Jira to fix intermittent failure in branch-2.10. Also, the fix > in YARN-7372 has some redundancy in assertion that could be removed. > UT failure in branch-2.10: > {noformat} > testContainerUpdateExecTypeGuaranteedToOpportunistic: > message='expected:OPPORTUNISTIC but >
[jira] [Resolved] (YARN-10052) TestApplicationMasterService is flakey
[ https://issues.apache.org/jira/browse/YARN-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Turner Eagles resolved YARN-10052. --- Resolution: Invalid > TestApplicationMasterService is flakey > -- > > Key: YARN-10052 > URL: https://issues.apache.org/jira/browse/YARN-10052 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > > Sometimes the state is allowed to progress to KILLED from KILLING too quickly > causing the test to fail. > {code} > --- > Test set: > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService > --- > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 56.817 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService > testApplicationMaxTimeout(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService) > Time elapsed: 3.593 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testApplicationMaxTimeout(TestApplicationMasterService.java:204) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10052) TestApplicationMasterService is flakey
Jonathan Turner Eagles created YARN-10052: - Summary: TestApplicationMasterService is flakey Key: YARN-10052 URL: https://issues.apache.org/jira/browse/YARN-10052 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Turner Eagles Assignee: Jonathan Turner Eagles Sometimes the state is allowed to progress to KILLED from KILLING too quickly causing the test to fail. {code} --- Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService --- Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 56.817 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService testApplicationMaxTimeout(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService) Time elapsed: 3.593 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testApplicationMaxTimeout(TestApplicationMasterService.java:204) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo
[ https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967878#comment-16967878 ] Jonathan Turner Eagles edited comment on YARN-9949 at 11/5/19 9:18 PM: --- Helping to get branch-3.2 compiling again to unblock other committers. I've reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It seems like there is more work to be done for the back-port so this seems the cleanest way until it's resolved. Thanks for understanding the reason for the revert. was (Author: jeagles): Helping to get branch-3.2 compiling again to unblock other commits. I've reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It seems like there is more work to be done for the back-port so this seems the cleaned way until it's resolved. Thanks for understanding the reason for the revert. > Add missing queue configs for root queue in > RMWebService#CapacitySchedulerInfo > --- > > Key: YARN-9949 > URL: https://issues.apache.org/jira/browse/YARN-9949 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: YARN-9949-001.patch, YARN-9949-002.patch > > > YARN-9937 has added below missing queue configs but missed to add for root > queue. > 1. Maximum Allocation > 2. Queue ACLs > 3. Queue Priority > 4. Application Lifetime > 5. Ordering Policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo
[ https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967878#comment-16967878 ] Jonathan Turner Eagles commented on YARN-9949: -- Helping to get branch-3.2 compiling again to unblock other commits. I've reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It seems like there is more work to be done for the back-port so this seems the cleaned way until it's resolved. Thanks for understanding the reason for the revert. > Add missing queue configs for root queue in > RMWebService#CapacitySchedulerInfo > --- > > Key: YARN-9949 > URL: https://issues.apache.org/jira/browse/YARN-9949 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: YARN-9949-001.patch, YARN-9949-002.patch > > > YARN-9937 has added below missing queue configs but missed to add for root > queue. > 1. Maximum Allocation > 2. Queue ACLs > 3. Queue Priority > 4. Application Lifetime > 5. Ordering Policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org