[ 
https://issues.apache.org/jira/browse/YARN-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204104#comment-14204104
 ] 

Jian He commented on YARN-2834:
-------------------------------

Talked with Zhijie, it should be fine to just ignore the token-renew exception 
on recovery and let app continue. App itself could decide whether it choose to 
fail or not when the token eventually expires.  Even in the regular case, RM 
doesn't fail the app if token renew fails, why do we need to fail the app if 
token-renew fails on recovery. 

> Resource manager crashed with Null Pointer Exception
> ----------------------------------------------------
>
>                 Key: YARN-2834
>                 URL: https://issues.apache.org/jira/browse/YARN-2834
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Jian He
>
> Resource manager failed after restart. 
> {noformat}
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initializeQueues(467)) - Initialized root queue root: 
> numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initializeQueueMappings(436)) - Initialized queue 
> mappings, override: false
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(305)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<<memory:256, vCores:1>>, maximumAllocation=<<memory:2048, 
> vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
> 2014-11-09 04:12:53,015 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in 
> state STARTED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1041)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1005)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:821)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:843)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:826)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:701)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to