[jira] [Created] (YARN-11129) Purge canceled RM DelegationTokenRenewal TimerTasks from the scheduler

2022-05-06 Thread Jonathan Turner Eagles (Jira)
Jonathan Turner Eagles created YARN-11129:
-

 Summary: Purge canceled RM DelegationTokenRenewal TimerTasks from 
the scheduler
 Key: YARN-11129
 URL: https://issues.apache.org/jira/browse/YARN-11129
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Turner Eagles
 Attachments: Screen Shot 2022-04-27 at 4.20.04 PM.png

When yarn.resourcemanager.delegation-token.always-cancel=true, DelegationToken 
are canceled, but are not able to be garbage collected since they are held 
until the original expiry. This Jira will evaluate purging canceled delegation 
tokens possible amortizing the cost by doing this once a threshold is exceeded

!Screen Shot 2022-04-27 at 4.20.04 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class

2022-04-27 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles updated YARN-6:
--
Issue Type: Improvement  (was: Bug)

> Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter 
> class
> ---
>
> Key: YARN-6
> URL: https://issues.apache.org/jira/browse/YARN-6
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-6.001.perftest.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Came across a stack trace with SimpleDateFormatter in it which led me to 
> investigate current practices
>  
> {noformat}
>  6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 
> tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000]
>  6579    java.lang.Thread.State: RUNNABLE
>  6580     at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95)
>  6581     at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810)
>  6582     at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)
>  6583     at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)
>  6584     at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)
>  6585     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>  6586     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500)
>  6587     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069)
>  6588     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
>  6589     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936)
>  6590     at java.security.AccessController.doPrivileged(Native Method)
>  6591     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135)
>  6592     at 
> org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123)
>  6593     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875)
>  6594 
> {noformat}
>  
> DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread 
> local as they can be reused safely across threads. In addition, the new 
> classes are slightly more performant.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class

2022-04-27 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles updated YARN-6:
--
Priority: Minor  (was: Major)

> Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter 
> class
> ---
>
> Key: YARN-6
> URL: https://issues.apache.org/jira/browse/YARN-6
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Minor
>  Labels: pull-request-available
> Attachments: YARN-6.001.perftest.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Came across a stack trace with SimpleDateFormatter in it which led me to 
> investigate current practices
>  
> {noformat}
>  6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 
> tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000]
>  6579    java.lang.Thread.State: RUNNABLE
>  6580     at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95)
>  6581     at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810)
>  6582     at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)
>  6583     at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)
>  6584     at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)
>  6585     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>  6586     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500)
>  6587     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069)
>  6588     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
>  6589     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936)
>  6590     at java.security.AccessController.doPrivileged(Native Method)
>  6591     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135)
>  6592     at 
> org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123)
>  6593     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875)
>  6594 
> {noformat}
>  
> DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread 
> local as they can be reused safely across threads. In addition, the new 
> classes are slightly more performant.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class

2022-04-27 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles updated YARN-6:
--
Attachment: YARN-6.001.perftest.patch

> Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter 
> class
> ---
>
> Key: YARN-6
> URL: https://issues.apache.org/jira/browse/YARN-6
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: YARN-6.001.perftest.patch
>
>
> Came across a stack trace with SimpleDateFormatter in it which led me to 
> investigate current practices
>  
> {noformat}
>  6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 
> tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000]
>  6579    java.lang.Thread.State: RUNNABLE
>  6580     at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95)
>  6581     at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810)
>  6582     at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)
>  6583     at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)
>  6584     at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)
>  6585     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>  6586     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500)
>  6587     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069)
>  6588     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
>  6589     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936)
>  6590     at java.security.AccessController.doPrivileged(Native Method)
>  6591     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135)
>  6592     at 
> org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123)
>  6593     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875)
>  6594 
> {noformat}
>  
> DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread 
> local as they can be reused safely across threads. In addition, the new 
> classes are slightly more performant.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class

2022-04-27 Thread Jonathan Turner Eagles (Jira)
Jonathan Turner Eagles created YARN-6:
-

 Summary: Migrate Times util from SimpleDateFormat to thread-safe 
DateTimeFormatter class
 Key: YARN-6
 URL: https://issues.apache.org/jira/browse/YARN-6
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Turner Eagles
Assignee: Jonathan Turner Eagles


Came across a stack trace with SimpleDateFormatter in it which led me to 
investigate current practices

 

{noformat}

 6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 
tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000]
 6579    java.lang.Thread.State: RUNNABLE
 6580     at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95)
 6581     at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810)
 6582     at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)
 6583     at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)
 6584     at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)
 6585     at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
 6586     at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500)
 6587     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069)
 6588     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
 6589     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936)
 6590     at java.security.AccessController.doPrivileged(Native Method)
 6591     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135)
 6592     at 
org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123)
 6593     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875)
 6594 
{noformat}

 

DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread 
local as they can be reused safely across threads. In addition, the new classes 
are slightly more performant.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11096) Support node load based scheduling

2022-03-23 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511486#comment-17511486
 ] 

Jonathan Turner Eagles commented on YARN-11096:
---

I don't have the expertise to review, but I see this is implemented in Fair 
Scheduler and not Capacity Scheduler. Is this a design limitation?

> Support node load based scheduling
> --
>
> Key: YARN-11096
> URL: https://issues.apache.org/jira/browse/YARN-11096
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Deegue
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ResourceManager can scheduled according to the node load reported by 
> NodeManager through heartbeat.
>  
> We can set up threshold and auto skip the nodes with high load when 
> scheduling.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE

2021-08-18 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401400#comment-17401400
 ] 

Jonathan Turner Eagles commented on YARN-9744:
--

Cherry-picked this to branch-3.2 and branch-2.10 (where I also experienced this 
bug). Thanks for this fix [~prabhujoseph] and review [~abmodi]!

> RollingLevelDBTimelineStore.getEntityByTime fails with NPE
> --
>
> Key: YARN-9744
> URL: https://issues.apache.org/jira/browse/YARN-9744
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 2.10.2, 3.2.4
>
> Attachments: YARN-9744-001.patch
>
>
> RollingLevelDBTimelineStore.getEntityByTime fails with NPE.
> {code}
> 2019-08-07 12:58:55,990 WARN  ipc.Server (Server.java:logException(2433)) - 
> IPC Server handler 0 on 10200, call 
> org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from 
> 10.21.216.93:36392 Call#29446915 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786)
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
> at 
> org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> {code}
> This affects Rest Api to get entities.
> curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE

2021-08-18 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles updated YARN-9744:
-
Fix Version/s: 3.2.4
   2.10.2

> RollingLevelDBTimelineStore.getEntityByTime fails with NPE
> --
>
> Key: YARN-9744
> URL: https://issues.apache.org/jira/browse/YARN-9744
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 2.10.2, 3.2.4
>
> Attachments: YARN-9744-001.patch
>
>
> RollingLevelDBTimelineStore.getEntityByTime fails with NPE.
> {code}
> 2019-08-07 12:58:55,990 WARN  ipc.Server (Server.java:logException(2433)) - 
> IPC Server handler 0 on 10200, call 
> org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from 
> 10.21.216.93:36392 Call#29446915 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786)
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
> at 
> org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> {code}
> This affects Rest Api to get entities.
> curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly

2020-05-06 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101026#comment-17101026
 ] 

Jonathan Turner Eagles commented on YARN-8959:
--

+1. Thanks, [~ahussein]. Looks great! Feel free to ping me on the follow-up 
jira.

> TestContainerResizing fails randomly
> 
>
> Key: YARN-8959
> URL: https://issues.apache.org/jira/browse/YARN-8959
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-8959-branch-2.10.002.patch, 
> YARN-8959-branch-2.10.003.patch, YARN-8959-branch-2.10.004.patch, 
> YARN-8959.001.patch, YARN-8959.002.patch, YARN-8959.003.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer
> {code}
> testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.348 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<3072>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210)
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted
> {code}
> testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.445 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<7168>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729)
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer
> {code}
> testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.321 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<2048>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly

2020-05-05 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100138#comment-17100138
 ] 

Jonathan Turner Eagles commented on YARN-8959:
--

[~ahussein], thanks for taking up this issue. It's always nice to improve the 
stability of tests. I see you created a specialized waitFor condition to 
handler this scenario. Your description and analysis are very helpful in 
understanding this test failure. I can see that the dispatcher will await if it 
tries to take and the queue (LinkedBlockingQueue in this case) is empty, 
possibly returning early. However, in addition to the waitForThreadToWait 
condition, there is also DrainDispatcher.await(). This wait condition 
synchronously handles events and await will return when the event has 1) been 
taken from the queue and 2) been handled. This wait condition is the most 
popular in the code base, and seems sufficient to handle this condition.

Could waitForThreadToWait() be switched to await(). I think it will be easiest 
to understand as there is already a large precedent and readers of the code 
will be familiar. It will also be less specialized code to maintain in the code 
base.

Perhaps, I have missed something and await is insufficient. Let me know.

> TestContainerResizing fails randomly
> 
>
> Key: YARN-8959
> URL: https://issues.apache.org/jira/browse/YARN-8959
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-8959-branch-2.10.002.patch, 
> YARN-8959-branch-2.10.003.patch, YARN-8959.001.patch, YARN-8959.002.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer
> {code}
> testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.348 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<3072>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210)
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted
> {code}
> testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.445 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<7168>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729)
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer
> {code}
> testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing)
>   Time elapsed: 0.321 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<2048>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Commented] (YARN-10256) Refactor TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic

2020-05-01 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097420#comment-17097420
 ] 

Jonathan Turner Eagles commented on YARN-10256:
---

+1. I'll commit this today.

> Refactor 
> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
> ---
>
> Key: YARN-10256
> URL: https://issues.apache.org/jira/browse/YARN-10256
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: refactoring, unit-test
> Attachments: YARN-10256.001.patch
>
>
> In 3.x, 
> {{TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic}}
>  has redundant assertions. Since the UT throws timeout exception, 
> {{GenericTestsUtils.waitFor()}} guarantees that the predicate is met 
> successfully. Otherwise, the UT would throw a timeout exception.
> The redundant loop causes confusion in understanding the test unit and may 
> increase the possibility of failure in case the container terminates



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10255) revisit fix to intermittent TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic

2020-04-30 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096903#comment-17096903
 ] 

Jonathan Turner Eagles commented on YARN-10255:
---

Ok. So this jira is really backport YARN-7372 to branch-2.10 and cleanup 
testContainerUpdateExecTypeGuaranteedToOpportunistic. After reading the 
description and you comment that is more clear now. Usually the way this is 
done is to file separate issues as there are two things being accomplished. 
This makes reviewers happy :) and makes branch managers (people maintaining 
release lines either community or internal) much easier so they know better 
what commits to pull in to their line.

I would expect to see something like this. 
- Backport YARN-7372 to branch-2.10
- Refactor TestContainerSchedulerQueuing 
.testContainerUpdateExecTypeGuaranteedToOpportunistic

> revisit fix to intermittent 
> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
> --
>
> Key: YARN-10255
> URL: https://issues.apache.org/jira/browse/YARN-10255
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: unit-test
> Attachments: YARN-10255-branch-2.10.001.patch, YARN-10255.001.patch
>
>
> Creating this Jira to fix intermittent failure in branch-2.10. Also, the fix 
> in YARN-7372 has some redundancy in assertion that could be removed.
> UT failure in branch-2.10:
>  {noformat}
> testContainerUpdateExecTypeGuaranteedToOpportunistic:
>   message='expected:OPPORTUNISTIC but 
> 

[jira] [Commented] (YARN-10255) revisit fix to intermittent TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic

2020-04-30 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096872#comment-17096872
 ] 

Jonathan Turner Eagles commented on YARN-10255:
---

I see a 4k patch, which is pretty substantial, but don't see an analysis that 
shows the need for these changes. Is the effective change besides the 
refactoring (lambda) just to change the delay on the scheduler? If not please 
help explain the changes along with the analysis.

> revisit fix to intermittent 
> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
> --
>
> Key: YARN-10255
> URL: https://issues.apache.org/jira/browse/YARN-10255
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: unit-test
> Attachments: YARN-10255-branch-2.10.001.patch, YARN-10255.001.patch
>
>
> Creating this Jira to fix intermittent failure in branch-2.10. Also, the fix 
> in YARN-7372 has some redundancy in assertion that could be removed.
> UT failure in branch-2.10:
>  {noformat}
> testContainerUpdateExecTypeGuaranteedToOpportunistic:
>   message='expected:OPPORTUNISTIC but 
> 

[jira] [Resolved] (YARN-10052) TestApplicationMasterService is flakey

2019-12-20 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles resolved YARN-10052.
---
Resolution: Invalid

> TestApplicationMasterService is flakey
> --
>
> Key: YARN-10052
> URL: https://issues.apache.org/jira/browse/YARN-10052
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>
> Sometimes the state is allowed to progress to KILLED from KILLING too quickly 
> causing the test to fail.
> {code}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
> ---
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 56.817 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
> testApplicationMaxTimeout(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService)
>   Time elapsed: 3.593 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testApplicationMaxTimeout(TestApplicationMasterService.java:204)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10052) TestApplicationMasterService is flakey

2019-12-20 Thread Jonathan Turner Eagles (Jira)
Jonathan Turner Eagles created YARN-10052:
-

 Summary: TestApplicationMasterService is flakey
 Key: YARN-10052
 URL: https://issues.apache.org/jira/browse/YARN-10052
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Turner Eagles
Assignee: Jonathan Turner Eagles


Sometimes the state is allowed to progress to KILLED from KILLING too quickly 
causing the test to fail.
{code}
---
Test set: 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
---
Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 56.817 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
testApplicationMaxTimeout(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService)
  Time elapsed: 3.593 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testApplicationMaxTimeout(TestApplicationMasterService.java:204)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo

2019-11-05 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967878#comment-16967878
 ] 

Jonathan Turner Eagles edited comment on YARN-9949 at 11/5/19 9:18 PM:
---

Helping to get branch-3.2 compiling again to unblock other committers. I've 
reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It 
seems like there is more work to be done for the back-port so this seems the 
cleanest way until it's resolved. Thanks for understanding the reason for the 
revert.


was (Author: jeagles):
Helping to get branch-3.2 compiling again to unblock other commits. I've 
reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It 
seems like there is more work to be done for the back-port so this seems the 
cleaned way until it's resolved. Thanks for understanding the reason for the 
revert.

> Add missing queue configs for root queue in 
> RMWebService#CapacitySchedulerInfo 
> ---
>
> Key: YARN-9949
> URL: https://issues.apache.org/jira/browse/YARN-9949
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9949-001.patch, YARN-9949-002.patch
>
>
> YARN-9937 has added below missing queue configs but missed to add for root 
> queue.
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime
> 5. Ordering Policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo

2019-11-05 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967878#comment-16967878
 ] 

Jonathan Turner Eagles commented on YARN-9949:
--

Helping to get branch-3.2 compiling again to unblock other commits. I've 
reverted commit 11c763c22055fea367b19b338a3d8067f9386ba4 in branch-3.2. It 
seems like there is more work to be done for the back-port so this seems the 
cleaned way until it's resolved. Thanks for understanding the reason for the 
revert.

> Add missing queue configs for root queue in 
> RMWebService#CapacitySchedulerInfo 
> ---
>
> Key: YARN-9949
> URL: https://issues.apache.org/jira/browse/YARN-9949
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9949-001.patch, YARN-9949-002.patch
>
>
> YARN-9937 has added below missing queue configs but missed to add for root 
> queue.
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime
> 5. Ordering Policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org