[ 
https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7252:
--------------------------------
    Attachment: YARN-7252-YARN-5734.002.patch

> Removing queue then failing over results in exception
> -----------------------------------------------------
>
>                 Key: YARN-7252
>                 URL: https://issues.apache.org/jira/browse/YARN-7252
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Critical
>         Attachments: YARN-7252-YARN-5734.001.patch, 
> YARN-7252-YARN-5734.002.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 
> is active. First, put root.a into STOPPED state, then remove it. Then put rm1 
> in standby and rm2 in active. Here's the exception: {noformat}Operation 
> failed: Error on refreshAll during transition to Active
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
>       at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>       at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation 
> failed
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
>       ... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted 
> from the new capacity scheduler configuration, but the queue is not yet in 
> stopped state. Current State : RUNNING
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
>       ... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity 
> scheduler configuration, but the queue is not yet in stopped state. Current 
> State : RUNNING
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
>       ... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and 
> sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to