[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623489#comment-14623489
 ] 

Wangda Tan commented on YARN-3894:
----------------------------------

Hi [~bibinchundatt],
Are you using latest trunk? In latest trunk, node label related capacity 
checking for capacity scheduler is not related to node label manager 
initialization. Misconfiguration of node label capacity should fail CS.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3894
>                 URL: https://issues.apache.org/jira/browse/YARN-3894
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
>     throws IOException {   
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     labelManager.reinitializeQueueLabels(getQueueToLabels());
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     LOG.info("Initialized root queue " + root);
>     initializeQueueMappings();
>     setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=<memory:0, vCores:0>, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
>         ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActive    TARGET=RMHAProtocolService      
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: 
> Failed to re-init queues
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to