[ 
https://issues.apache.org/jira/browse/YARN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782274#comment-13782274
 ] 

Jian He commented on YARN-1255:
-------------------------------

RM might be killed while it's saving the app data(after the app file is 
created, before the data is written into the file), when RM recovers it loads 
an empty file and gets a NULL exception, reproduced this locally and see the 
same exception stack.

> RM fails to start up with Failed to load/recover state error in a HA setup
> --------------------------------------------------------------------------
>
>                 Key: YARN-1255
>                 URL: https://issues.apache.org/jira/browse/YARN-1255
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Arpit Gupta
>
> {code}
> 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
> capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, 
> vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, 
> numContainers=0
> 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
> numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
> 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
> numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
> 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, 
> vCores:32>>
> 2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:register(157)) - Registering class 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
> 2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:register(157)) - Registering class 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
> for class 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
> 2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
> (RMNMInfo.java:<init>(63)) - Registered RMNMInfo MBean
> 2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
> (HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
> 2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
> (UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
> user rm/hostname@realm using keytab file 
> /etc/security/keytabs/rm.service.keytab
> 2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
> (RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
> for container-tokens
> 2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
> (AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
> amrm-tokens
> 2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
> (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
> nm-tokens
> 2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
> (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
> node: application_1380531989689_0002
> 2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
> (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
> node: application_1380531989689_0003
> 2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
> (RMDelegationTokenSecretManager.java:recover(181)) - recovering 
> RMDelegationTokenSecretManager.
> 2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(329)) - Recovering 2 applications
> 2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
> 2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
> (StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to