[ 
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088468#comment-16088468
 ] 

Rohith Sharma K S commented on YARN-6827:
-----------------------------------------

Attaching the failure trace below. This shows that applications are recovered 
first before ATS services are started. 

{noformat}
2017-07-15 10:19:35,200 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to 
active state
2017-07-15 10:19:35,245 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery started
2017-07-15 10:19:35,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded RM 
state version info 1.4
2017-07-15 10:19:35,431 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Unknown 
child node with name: HIERARCHIES
2017-07-15 10:19:35,452 INFO 
org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
 recovering RMDelegationTokenSecretManager.
2017-07-15 10:19:35,455 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 16 
applications
2017-07-15 10:19:35,518 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue : default for application: 
application_1499929227397_0001
2017-07-15 10:19:35,578 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
 Error when publishing entity [YARN_APPLICATION,application_1499929227397_0001]
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.appFinished(TimelineServiceV1Publisher.java:156)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalTransition.transition(RMAppImpl.java:1472)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1073)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1062)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:887)
        at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:383)
        at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:590)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1372)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:749)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:317)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:143)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:472)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

{noformat}

> [ATS1/1.5] NPE exception while publishing recovering applications into ATS 
> during RM restart.
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-6827
>                 URL: https://issues.apache.org/jira/browse/YARN-6827
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>
> While recovering application, it is observed that NPE exception is thrown as 
> below.
> {noformat}
> 017-07-13 14:08:12,476 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
>  Error when publishing entity 
> [YARN_APPLICATION,application_1499929227397_0001]
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
> {noformat}
> This is because in RM service creation, active services are created first and 
> later ATS services are created. It means active services are started and ATS 
> services are started later point of time. 
> This gives sufficient time to active services recover the applications which 
> tries to publish into ATS while recovering. Since ATS services are not 
> started yet, it throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to