[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
-----------------------------------
    Description: 
Cases that can cause failure

# Capacity scheduler xml is wrongly configured in switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration


Capacity failure condition have given logs below
Continuously both RM will try to be active

{code}
2015-07-07 19:18:25,655 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=0, numContainers=0
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
queues.
java.io.IOException: Failed to re-init queues
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
children of queue root for label=node2
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
        ... 8 more
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  
DESCRIPTION=Exception refresh queues.   PERMISSIONS=
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=
2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        ... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: 
Failed to re-init queues
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
        ... 5 more

{code}


{code}

dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active

{code}

  was:
Node label configuration for capacity scheduler not correct and restart both RM 
.Both RM will become active state


{code}
2015-07-07 19:18:25,655 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=0, numContainers=0
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
queues.
java.io.IOException: Failed to re-init queues
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
children of queue root for label=node2
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
        ... 8 more
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  
DESCRIPTION=Exception refresh queues.   PERMISSIONS=
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=
2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        ... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: 
Failed to re-init queues
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
        ... 5 more

{code}


{code}

dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active

{code}

    Component/s: resourcemanager
        Summary: Both RM in active state when Admin#transitionToActive failure 
from refeshAll()  (was: Both RM in active when CapacityScheduler#reinitialize 
failure)

Updated description since the cases can happen in many cases . Please do 
correct me if i am wrong

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: yarn-site.xml
>
>
> Cases that can cause failure
> # Capacity scheduler xml is wrongly configured in switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Capacity failure condition have given logs below
> Continuously both RM will try to be active
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=<memory:0, vCores:0>, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
>         ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActive    TARGET=RMHAProtocolService      
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: 
> Failed to re-init queues
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         ... 5 more
> {code}
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to