[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957644#comment-16957644
 ] 

Bibin Chundatt commented on YARN-2442:
--

Thank you  [~cyrusjackson25] for working on the patch

Currently RMInfo is holding the reference of RMContext which could lead to 
memory leak on switch over. Instead we could use ResourceManager object 
directly.



> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957644#comment-16957644
 ] 

Bibin Chundatt edited comment on YARN-2442 at 10/23/19 8:16 AM:


Thank you  [~cyrusjackson25] for working on the patch

# Currently RMInfo is holding the reference of RMContext which could lead to 
memory leak on switch over. Instead we could use ResourceManager instance 
directly.
# Fix the checkstyle issues
# Findbug issue seems to be already fix.




was (Author: bibinchundatt):
Thank you  [~cyrusjackson25] for working on the patch

Currently RMInfo is holding the reference of RMContext which could lead to 
memory leak on switch over. Instead we could use ResourceManager object 
directly.



> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9624) Use switch case for ProtoUtils#convertFromProtoFormat containerState

2019-10-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9624:

Attachment: YARN-9624.002.patch

> Use switch case for ProtoUtils#convertFromProtoFormat containerState
> 
>
> Key: YARN-9624
> URL: https://issues.apache.org/jira/browse/YARN-9624
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: performance
> Attachments: YARN-9624.001.patch, YARN-9624.002.patch
>
>
> On large cluster with 100K+ containers on every heartbeat 
> {{ContainerState.valueOf(e.name().replace(CONTAINER_STATE_PREFIX, ""))}} will 
> be too costly. Update with switch case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9624) Use switch case for ProtoUtils#convertFromProtoFormat containerState

2019-10-23 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957680#comment-16957680
 ] 

Bilwa S T commented on YARN-9624:
-

Hi [~bibinchundatt] 
Updated patch . Please review.

> Use switch case for ProtoUtils#convertFromProtoFormat containerState
> 
>
> Key: YARN-9624
> URL: https://issues.apache.org/jira/browse/YARN-9624
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: performance
> Attachments: YARN-9624.001.patch, YARN-9624.002.patch
>
>
> On large cluster with 100K+ containers on every heartbeat 
> {{ContainerState.valueOf(e.name().replace(CONTAINER_STATE_PREFIX, ""))}} will 
> be too costly. Update with switch case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957693#comment-16957693
 ] 

Zhankun Tang commented on YARN-9921:


[~Prabhu Joseph], [~sunilg], if no more comment. I'll commit it soon

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957707#comment-16957707
 ] 

Prabhu Joseph commented on YARN-9921:
-

[~tangzhankun] The patch looks good. +1 

Thanks [~tarunparimi] for the patch.

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-10-23 Thread Vinod Kumar Vavilapalli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957719#comment-16957719
 ] 

Vinod Kumar Vavilapalli commented on YARN-9772:
---

This "parent queue and left queue sharing the same name" is at best a bug. It 
wasn't ever an intended feature. Isn't that right or was there an explicit JIRA 
where this was done?

Assuming no, and if we want to add this feature, we should do it deliberately 
making sure that all the other code-paths handle it correctly. Till then, I 
vote for disallowing such queues. If some operators accidentally used this 'bug 
as a feature', we should reserve the right to break it.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: YARN-9925-003.patch

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch, 
> YARN-9925-003.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957724#comment-16957724
 ] 

Prabhu Joseph commented on YARN-9925:
-

Thanks [~maniraj...@gmail.com] for the update. Yes addressing YARN-9772 will 
fix this issue. 

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch, 
> YARN-9925-003.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9624) Use switch case for ProtoUtils#convertFromProtoFormat containerState

2019-10-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957739#comment-16957739
 ] 

Hadoop QA commented on YARN-9624:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 
22 unchanged - 0 fixed = 23 total (was 22) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9624 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983817/YARN-9624.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 09828b94d971 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a901405 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25032/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25032/testReport/ |
| Max. process+thread count | 314 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreC

[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: (was: YARN-9925-003.patch)

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: YARN-9925-003.patch

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch, 
> YARN-9925-003.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-10-23 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957776#comment-16957776
 ] 

Tarun Parimi commented on YARN-9772:


The operators having several hundreds of queues might accidentally configured 
this way. Since there is no current document which says to do otherwise.

Detailing it in documentation and the printing the complete queue paths which 
violate the rule will help those few people to change their queue configs 
properly.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson updated YARN-2442:

Attachment: YARN-2442.004.patch

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.004.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9693) When AMRMProxyService is enabled RMCommunicator will register with failure

2019-10-23 Thread panlijie (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957834#comment-16957834
 ] 

panlijie commented on YARN-9693:


we find the same error when we submmit spark on yarn RBF as txt :

Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Invalid AMRMToken from appattempt_1571831510550_0004_02

 

> When AMRMProxyService is enabled RMCommunicator will register with failure
> --
>
> Key: YARN-9693
> URL: https://issues.apache.org/jira/browse/YARN-9693
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> When we enable amrm proxy service, the  RMCommunicator will register with 
> failure below:
> {code:java}
> 2019-07-23 17:12:44,794 INFO [TaskHeartbeatHandler PingChecker] 
> org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler 
> thread interrupted
> 2019-07-23 17:12:44,794 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563872237585_0001_02
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:186)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:123)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:986)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1300)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1768)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1698)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> Invalid AMRMToken from appattempt_1563872237585_0001_02
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy93.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:170)
>   ... 14 more
> Caused

[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957838#comment-16957838
 ] 

Hadoop QA commented on YARN-9925:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 39 unchanged - 1 fixed = 39 total (was 40) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 42s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestAbsoluteResourceConfiguration
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueState |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resource

[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: (was: YARN-9925-003.patch)

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-23 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957845#comment-16957845
 ] 

Tao Yang edited comment on YARN-7621 at 10/23/19 12:48 PM:
---

Hi, [~cane]. Sorry for the late reply.

It's make perfect sense for me to support duplicate queue names, as [~wilfreds] 
mentioned, there's more work to do for that.  I'm afraid of having no time to 
work on this recently, please feel free to take over this issue if you want, 
Thanks.


was (Author: tao yang):
Hi, [~cane]. Sorry for the late reply.

It's make perfect sense for me to support duplicate queue names, as [~wilfreds] 
mentioned, there's more work to do for that.  I'm afraid of having no time to 
work on this recently, please feel free to take over this issue, Thanks.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-23 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957845#comment-16957845
 ] 

Tao Yang commented on YARN-7621:


Hi, [~cane]. Sorry for the late reply.

It's make perfect sense for me to support duplicate queue names, as [~wilfreds] 
mentioned, there's more work to do for that.  I'm afraid of having no time to 
work on this recently, please feel free to take over this issue, Thanks.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-10-23 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957845#comment-16957845
 ] 

Tao Yang edited comment on YARN-7621 at 10/23/19 12:51 PM:
---

Hi, [~cane]. Sorry for the late reply.

It makes perfect sense for me to support duplicate queue names, as [~wilfreds] 
mentioned, there's more work to do for that.  I'm afraid of having no time to 
work on this recently, please feel free to take over this issue if you want, 
Thanks.


was (Author: tao yang):
Hi, [~cane]. Sorry for the late reply.

It's make perfect sense for me to support duplicate queue names, as [~wilfreds] 
mentioned, there's more work to do for that.  I'm afraid of having no time to 
work on this recently, please feel free to take over this issue if you want, 
Thanks.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9830) Improve ContainerAllocationExpirer it blocks scheduling

2019-10-23 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957882#comment-16957882
 ] 

Sunil G commented on YARN-9830:
---

Thanks [~bibinchundatt] 

I think this change seems fine to me. this get us into some fine grained lock, 
and i think its much better for performance, 

[~cheersyang] [~jhung] [~rohithsharmaks]  cud u pls help to take a help and 
share your thoughts.

> Improve ContainerAllocationExpirer it blocks scheduling
> ---
>
> Key: YARN-9830
> URL: https://issues.apache.org/jira/browse/YARN-9830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bibin Chundatt
>Priority: Critical
>  Labels: perfomance
> Attachments: YARN-9830.001.patch
>
>
> {quote}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor.register(AbstractLivelinessMonitor.java:106)
> - waiting to lock <0x7fa348749550> (a 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:601)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:592)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> - locked <0x7fc8852f8200> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9925:

Attachment: YARN-9925-003.patch

> CapacitySchedulerQueueManager allows unsupported Queue hierarchy
> 
>
> Key: YARN-9925
> URL: https://issues.apache.org/jira/browse/YARN-9925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9925-001.patch, YARN-9925-002.patch, 
> YARN-9925-003.patch
>
>
> CapacitySchedulerQueueManager allows unsupported Queue hierarchy. When 
> creating a queue with same name as an existing parent queue name - it has to 
> fail with below.
> {code:java}
> Caused by: java.io.IOException: A is moved from:root.A to:root.B.A after 
> refresh, which is not allowed.Caused by: java.io.IOException: A is moved 
> from:root.A to:root.B.A after refresh, which is not allowed. at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:335)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:180)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:762)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:473)
>  ... 70 more 
> {code}
> In Some cases, the error is not thrown while creating the queue but thrown at 
> submission of job "Failed to submit application_1571677375269_0002 to YARN : 
> Application application_1571677375269_0002 submitted by user : systest to 
> non-leaf queue : B"
> Below scenarios are allowed but it should not
> {code:java}
> It allows root.A.A1.B when root.B.B1 already exists.
>
> 1. Add root.A
> 2. Add root.A.A1
> 3. Add root.B
> 4. Add root.B.B1
> 5. Allows Add of root.A.A1.B 
> It allows two root queues:
>
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Allows Add of root.A.A1.root
>
> {code}
> Below scenario is handled properly:
> {code:java}
> It does not allow root.B.A when root.A.A1 already exists.
>  
> 1. Add root.A
> 2. Add root.B
> 3. Add root.A.A1
> 4. Does not Allow Add of root.B.A
> {code}
> This error handling has to be consistent in all scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957920#comment-16957920
 ] 

Hadoop QA commented on YARN-2442:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 46 unchanged - 0 fixed = 47 total (was 46) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 
25s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-2442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983835/YARN-2442.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 861ccc40d3d0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a901405 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25035/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25035/testReport/ |
| Max. process+thread count | 864 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-res

[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-10-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957934#comment-16957934
 ] 

Prabhu Joseph commented on YARN-9772:
-

Full Queue Path Support is required to allow queues to share the same name. 
This will help customers with FS to migrate easily (YARN-7621).

Full Queue Path Support needs lot of effort and also it should not affect the 
existing behavior like the way queue name specified in Yarn Application and 
Capacity Scheduler Configs (Queues, Queue Placement mapping).

Until this gets fixed, will suggest to have an immediate fix by not allowing 
queues to share same name to solve the existing inconsistent behavior. 
(YARN-9772, YARN-9925, YARN-9766). If you are okay, have a 
[^YARN-9925-003.patch] to disallow such queues which can be used.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-10-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957934#comment-16957934
 ] 

Prabhu Joseph edited comment on YARN-9772 at 10/23/19 2:53 PM:
---

Full Queue Path Support is required to allow queues to share the same name. 
This will help customers with FS to migrate easily (YARN-7621).

Full Queue Path Support needs lot of effort and also it should not affect the 
existing behavior like the way queue name specified in Yarn Application and 
Capacity Scheduler Configs (Queues, Queue Placement mapping).

Until this gets fixed, will suggest to have an immediate fix by not allowing 
queues to share same name to solve the existing inconsistent behavior. 
(YARN-9772, YARN-9925, YARN-9766). If you are okay, have a  
[patch|https://issues.apache.org/jira/secure/attachment/12983845/YARN-9925-003.patch]
 to disallow such queues which can be used.


was (Author: prabhu joseph):
Full Queue Path Support is required to allow queues to share the same name. 
This will help customers with FS to migrate easily (YARN-7621).

Full Queue Path Support needs lot of effort and also it should not affect the 
existing behavior like the way queue name specified in Yarn Application and 
Capacity Scheduler Configs (Queues, Queue Placement mapping).

Until this gets fixed, will suggest to have an immediate fix by not allowing 
queues to share same name to solve the existing inconsistent behavior. 
(YARN-9772, YARN-9925, YARN-9766). If you are okay, have a 
[^YARN-9925-003.patch] to disallow such queues which can be used.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9933) RMWebServices get-node-labels response type is not same

2019-10-23 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-9933:
---

 Summary: RMWebServices get-node-labels response type is not same
 Key: YARN-9933
 URL: https://issues.apache.org/jira/browse/YARN-9933
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


RMWebServices get-node-labels response type is not same. It returns object if 
one node label is present and an array in case of multiple node label is 
present.

*With One Node Label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
 
 *With Multiple Node label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": [
{"name":"x","exclusivity":"true"}, 
{"name":"y","exclusivity":"true"}
  ]
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9897) Add an Aarch64 CI for YARN

2019-10-23 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957955#comment-16957955
 ] 

Eric Yang commented on YARN-9897:
-

[~Kevin_Zheng] Thank you for the patch.  This looks good to me.  Will commit 
HADOOP-16614 if no objections.

> Add an Aarch64 CI for YARN
> --
>
> Key: YARN-9897
> URL: https://issues.apache.org/jira/browse/YARN-9897
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Zhenyu Zheng
>Priority: Major
> Attachments: hadoop_build.log
>
>
> As YARN is the resource manager of Hadoop and there are large number of other 
> software that also uses YARN for resource management. The capability of 
> running YARN on platforms with different architecture and managing hardware 
> resources with different architecture could be very important and useful.
> Aarch64(ARM) architecture is currently the dominate architecture in small 
> devices like phone, IOT devices, security cameras, drones etc. With the 
> increasing compuiting capability and the increasing connection speed like 5G 
> network, there could be greate posibility and opportunity for world chaging 
> inovations and new market if we can managing and make use of those devices as 
> well.
> Currently, all YARN CIs are based on x86 architecture and we have been 
> performing tests on Aarch64 and proposing possible solutions for problems we 
> have meet, like:
> https://issues.apache.org/jira/browse/HADOOP-16614
> we have done all YARN tests and it turns out there are only a few problems, 
> and we can provide possible solutions for discussion.
> We want to propose to add an Aarch64 CI for YARN to promote the support for 
> YARN on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-10-23 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958032#comment-16958032
 ] 

Manikandan R commented on YARN-9768:


Sorry for the delay. Attached .004.patch.

[~inigoiri] Addressed all of your comments.

[~bibinchundatt] Introduced {{DelegationTokenRenewerPoolTracker}} runnable 
class to process all futures in a separate thread.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-10-23 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.004.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9925) CapacitySchedulerQueueManager allows unsupported Queue hierarchy

2019-10-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958033#comment-16958033
 ] 

Hadoop QA commented on YARN-9925:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
38s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9925 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983845/YARN-9925-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 45437e419b5c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a901405 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25036/testReport/ |
| Max. process+thread count | 822 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25036/console |
| Powered by | Apache Y

[jira] [Commented] (YARN-9624) Use switch case for ProtoUtils#convertFromProtoFormat containerState

2019-10-23 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958057#comment-16958057
 ] 

Ayush Saxena commented on YARN-9624:


For the test instead of having if-else checks and explicitly passing one elemt, 
it can be done in one line using streams, Something like this :
{code:java}
  @Test
  public void testConvertFromOrToProtoFormat() {
// Check if utility has all enum values
Stream.of(ContainerState.values())
.forEach(a -> ProtoUtils.convertToProtoFormat(a));
Stream.of(ContainerSubState.values())
.forEach(a -> ProtoUtils.convertToProtoFormat(a));
Stream.of(ContainerSubStateProto.values())
.forEach(a -> ProtoUtils.convertFromProtoFormat(a));
Stream.of(ContainerStateProto.values())
.forEach(a -> ProtoUtils.convertFromProtoFormat(a));
  }
{code}
Give a check if it sounds better.

> Use switch case for ProtoUtils#convertFromProtoFormat containerState
> 
>
> Key: YARN-9624
> URL: https://issues.apache.org/jira/browse/YARN-9624
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: performance
> Attachments: YARN-9624.001.patch, YARN-9624.002.patch
>
>
> On large cluster with 100K+ containers on every heartbeat 
> {{ContainerState.valueOf(e.name().replace(CONTAINER_STATE_PREFIX, ""))}} will 
> be too costly. Update with switch case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9933) RMWebServices get-node-labels response type is not same

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9933:

Component/s: webapp

> RMWebServices get-node-labels response type is not same
> ---
>
> Key: YARN-9933
> URL: https://issues.apache.org/jira/browse/YARN-9933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> RMWebServices get-node-labels response type is not same. It returns object if 
> one node label is present and an array in case of multiple node label is 
> present.
> *With One Node Label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
>  
>  *With Multiple Node label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": [
> {"name":"x","exclusivity":"true"}, 
> {"name":"y","exclusivity":"true"}
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9933) RMWebServices get-node-labels response type is not same

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9933:

Labels: incompatibleChange  (was: )

> RMWebServices get-node-labels response type is not same
> ---
>
> Key: YARN-9933
> URL: https://issues.apache.org/jira/browse/YARN-9933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: incompatibleChange
>
> RMWebServices get-node-labels response type is not same. It returns object if 
> one node label is present and an array in case of multiple node label is 
> present.
> *With One Node Label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
>  
>  *With Multiple Node label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": [
> {"name":"x","exclusivity":"true"}, 
> {"name":"y","exclusivity":"true"}
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9933) RMWebServices get-node-labels json response is missing root element

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9933:

Summary: RMWebServices get-node-labels json response is missing root 
element  (was: RMWebServices get-node-labels response type is not same)

> RMWebServices get-node-labels json response is missing root element
> ---
>
> Key: YARN-9933
> URL: https://issues.apache.org/jira/browse/YARN-9933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: incompatibleChange
>
> RMWebServices get-node-labels response type is not same. It returns object if 
> one node label is present and an array in case of multiple node label is 
> present.
> *With One Node Label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
>  
>  *With Multiple Node label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": [
> {"name":"x","exclusivity":"true"}, 
> {"name":"y","exclusivity":"true"}
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9933) RMWebServices get-node-labels json response is missing root element

2019-10-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9933:

Description: 
RMWebServices get-node-labels json response is missing root element. It does 
not have root element *nodeLabelsInfo*

*With no Node Label:*
{code:java}
{}
{code}
 

*With One Node Label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
 
 *With Multiple Node label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": [
{"name":"x","exclusivity":"true"}, 
{"name":"y","exclusivity":"true"}
  ]
}
{code}
 

  was:
RMWebServices get-node-labels response type is not same. It returns object if 
one node label is present and an array in case of multiple node label is 
present.

*With One Node Label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
 
 *With Multiple Node label:*
{code:java}
[yarn@yarndocker-1 centos]$ curl 
http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
{"nodeLabelInfo": [
{"name":"x","exclusivity":"true"}, 
{"name":"y","exclusivity":"true"}
  ]
}
{code}
 


> RMWebServices get-node-labels json response is missing root element
> ---
>
> Key: YARN-9933
> URL: https://issues.apache.org/jira/browse/YARN-9933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: incompatibleChange
>
> RMWebServices get-node-labels json response is missing root element. It does 
> not have root element *nodeLabelsInfo*
> *With no Node Label:*
> {code:java}
> {}
> {code}
>  
> *With One Node Label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": {"name":"x","exclusivity":"true"} } {code}
>  
>  *With Multiple Node label:*
> {code:java}
> [yarn@yarndocker-1 centos]$ curl 
> http://yarndocker-1:8088/ws/v1/cluster/get-node-labels
> {"nodeLabelInfo": [
> {"name":"x","exclusivity":"true"}, 
> {"name":"y","exclusivity":"true"}
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-10-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958153#comment-16958153
 ] 

Íñigo Goiri commented on YARN-9768:
---

Sorry for the staged review... a few more comments:
* Avoid the superflous changes in TestDelegationTokenRenewer (L169, L627, L630).
* Use setClass in TestDelegationTokenRenewer#1550.
* Why is DEFAULT_RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_MAX_ATTEMPTS = +10?
* Define futures in DelegationTokenRenewer as Map.
* Should we be more careful or define better the casting to 
AbstractDelegationTokenRenewerAppEvent?
* Is the TimeoutException  code path tested?
* Let's avoid DelegationTokenRenewer 1009-1019 changes, we can do those cleanup 
in a separate JIRA if so.
* Add documentation for the attempt part in DelegationTokenRenewer.
* Avoid TestDelegationTokenRenewer L623 and L630.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958465#comment-16958465
 ] 

Zhankun Tang commented on YARN-9921:


[~prabhujoseph], Thanks for the review.

[~tarunparimi], Thanks for the patch. Committed to trunk and branch-3.1.

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9921:
---
Fix Version/s: 3.1.4
   3.3.0

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958468#comment-16958468
 ] 

Hudson commented on YARN-9921:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17565 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17565/])
YARN-9921. Issue in PlacementConstraint when YARN Service AM retries (ztang: 
rev fd84ca5161d171f7e754b9b06623c6118e048066)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SchedulingRequestPBImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java


> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson updated YARN-2442:

Attachment: (was: YARN-2442.004.patch)

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-23 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson updated YARN-2442:

Attachment: YARN-2442.004.patch

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.004.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org