[ 
https://issues.apache.org/jira/browse/YARN-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferenc Erdelyi updated YARN-11924:
----------------------------------
    Description: 
Should a "yarn resourcemanager -format-state-store" command be issued while one 
of the RM is starting and in the INIT state (because of YARN-11551), there is a 
time period when the /confstore/CONF_STORE path does not exist, hence the 
getZkData method returns a null value, causing the RM to fail. To prevent this, 
add a check and re-try mechanism before giving up.

 
{code:java}
FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error 
starting ResourceManagerorg.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues    
    at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
      at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)     at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:875)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)    
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1293)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:334)
  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)   
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1580)Caused
 by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize 
queues at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:738)
      at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:312)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:403)
   at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)  
   ... 7 moreCaused by: java.lang.IllegalStateException: Queue configuration 
missing child queue names for root    at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateParent(CapacitySchedulerQueueManager.java:741)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:255)
    at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:177)
      at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:729)
      ... 10 more {code}

  was:Should a "yarn resourcemanager -format-state-store" command be issued 
while one of the RM is starting and in the INIT state (because of YARN-11551), 
there is a time period when the /confstore/CONF_STORE path does not exist, 
hence the getZkData method returns a null value, causing the RM to fail. To 
prevent this, add a check and re-try mechanism before giving up.


> Add zkManager.exists(path) check to ZKConfigurationStore:getZkData() and 
> retry mechanism
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-11924
>                 URL: https://issues.apache.org/jira/browse/YARN-11924
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ferenc Erdelyi
>            Assignee: Ferenc Erdelyi
>            Priority: Major
>              Labels: pull-request-available
>
> Should a "yarn resourcemanager -format-state-store" command be issued while 
> one of the RM is starting and in the INIT state (because of YARN-11551), 
> there is a time period when the /confstore/CONF_STORE path does not exist, 
> hence the getZkData method returns a null value, causing the RM to fail. To 
> prevent this, add a check and re-try mechanism before giving up.
>  
> {code:java}
> FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error 
> starting ResourceManagerorg.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues  
>     at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)     
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:875)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)  
>    at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:334)
>   at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) 
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1580)Caused
>  by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize 
> queues at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:738)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:312)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:403)
>    at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)     
> ... 7 moreCaused by: java.lang.IllegalStateException: Queue configuration 
> missing child queue names for root    at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateParent(CapacitySchedulerQueueManager.java:741)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:255)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:177)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:729)
>       ... 10 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to