[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197793#comment-16197793
 ] 

Hudson commented on YARN-7252:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13057/])
YARN-7252. Removing queue then failing over results in exception (jhung: rev 
1d36b53ab6d9bb1d9144101e424c24371343c5bf)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java


> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Critical
> Fix For: YARN-5734
>
> Attachments: YARN-7252-YARN-5734.001.patch, 
> YARN-7252-YARN-5734.002.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.ap

[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183536#comment-16183536
 ] 

Wangda Tan commented on YARN-7252:
--

+1, thanks [~jhung]!

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Critical
> Attachments: YARN-7252-YARN-5734.001.patch, 
> YARN-7252-YARN-5734.002.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181813#comment-16181813
 ] 

Hadoop QA commented on YARN-7252:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-5734 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
48s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} YARN-5734 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | YARN-7252 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12889169/YARN-7252-YARN-5734.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1988b9fa172a 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-5734 / 9cac5d3 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/17650/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-serve

[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181728#comment-16181728
 ] 

Jonathan Hung commented on YARN-7252:
-

[~leftnoteasy] thanks for taking a look. Attached 002 patch.
bq. Add isConfigurationMutable to CSContext can avoid type conversion.
done
bq. Could you add a test to make sure that scheduler still validate queue 
hierarchy when transition from init->active? (since we don't valid queue 
hierarchy when fail over, we should check it when RM start.
I don't think validateQueueHierarchy is called on RM startup, it's only called 
on scheduler refresh. (i.e. it is only called in 
CapacityScheduler#reinitializeQueues, but not in 
CapacityScheduler#initializeQueues). It's just to validate differences between 
the old and new queue hierarchy, and on startup there is no old queue 
hierarchy. Let me know if I'm missing something.

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Critical
> Attachments: YARN-7252-YARN-5734.001.patch, 
> YARN-7252-YARN-5734.002.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181576#comment-16181576
 ] 

Wangda Tan commented on YARN-7252:
--

[~jhung],

Thanks for the explanation, overall the fix looks good, few comments: 

1) Add isConfigurationMutable to CSContext can avoid type conversion.
2) Could you add a test to make sure that scheduler still validate queue 
hierarchy when transition from init->active? (since we don't valid queue 
hierarchy when fail over, we should check it when RM start.

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Critical
> Attachments: YARN-7252-YARN-5734.001.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181570#comment-16181570
 ] 

Hadoop QA commented on YARN-7252:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-5734 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
11s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} YARN-5734 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueState |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | YARN-7252 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12889134/YARN-7252-YARN-5734.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 43070c6ea174 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-5734 / 9cac5d3 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/17643/artifact/patchprocess/diff-checks

[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181512#comment-16181512
 ] 

Jonathan Hung commented on YARN-7252:
-

Hi [~leftnoteasy], it will be added to the logs. The issue is only on RM HA - 
on active RM1, we need two operations to delete a queue, 1. STOP 2. delete 
queue. For the standby RM2, it will still have the queue in memory. Then when 
RM2 is put into active, it tries to refresh from the old conf containing the 
queue (in RUNNING state) to the new conf without the queue, which is why the 
exception is thrown.

The patch just bypasses the logic which throws this exception 
(CapacitySchedulerQueueManager#validateQueueHierarchy). On failover (when using 
the mutable conf provider) we don't want to validate, since it was already 
validated on RM1.

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Critical
> Attachments: YARN-7252-YARN-5734.001.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181441#comment-16181441
 ] 

Wangda Tan commented on YARN-7252:
--

[~jhung], I'm trying to understand the problem: when root.a deleted by admin, 
will be this added to edit log? If yes, we should not check STOPPED state for a 
delete queue. 

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7252-YARN-5734.001.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-26 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181394#comment-16181394
 ] 

Jonathan Hung commented on YARN-7252:
-

001 patch to fix this issue. Ended up going a different approach, the issue 
only happens when failing over. In this case validation should not be done. So 
it just checks if it is still in standby (i.e. we are still doing refresh* 
prior to transitioning to active) and also we are using mutable configuration 
provider.

[~leftnoteasy]/[~xgong] can you take a look at this? Thanks!

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7252-YARN-5734.001.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7252) Removing queue then failing over results in exception

2017-09-25 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180072#comment-16180072
 ] 

Jonathan Hung commented on YARN-7252:
-

One approach I see is adding a reinitialize wrapper in CapacityScheduler with a 
flag to tell it whether to check queue hierarchy or not. Assumption being, on 
failover, you don't need to check the queue hierarchy since the previous RM did 
it.

Another less intrusive approach would be to replay the logs not read by the 
standby-now-active RM. This seems wasteful though, and there's a lot more logic 
required for this change.

> Removing queue then failing over results in exception
> -
>
> Key: YARN-7252
> URL: https://issues.apache.org/jira/browse/YARN-7252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>   ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>   ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>   ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org