[
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088468#comment-16088468
]
Rohith Sharma K S commented on YARN-6827:
-----------------------------------------
Attaching the failure trace below. This shows that applications are recovered
first before ATS services are started.
{noformat}
2017-07-15 10:19:35,200 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to
active state
2017-07-15 10:19:35,245 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery started
2017-07-15 10:19:35,253 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded RM
state version info 1.4
2017-07-15 10:19:35,431 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Unknown
child node with name: HIERARCHIES
2017-07-15 10:19:35,452 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
recovering RMDelegationTokenSecretManager.
2017-07-15 10:19:35,455 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 16
applications
2017-07-15 10:19:35,518 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Priority '0' is acceptable in queue : default for application:
application_1499929227397_0001
2017-07-15 10:19:35,578 ERROR
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
Error when publishing entity [YARN_APPLICATION,application_1499929227397_0001]
java.lang.NullPointerException
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.appFinished(TimelineServiceV1Publisher.java:156)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalTransition.transition(RMAppImpl.java:1472)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1073)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1062)
at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:887)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:383)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:590)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1372)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:749)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:317)
at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:143)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:472)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
{noformat}
> [ATS1/1.5] NPE exception while publishing recovering applications into ATS
> during RM restart.
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-6827
> URL: https://issues.apache.org/jira/browse/YARN-6827
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
>
> While recovering application, it is observed that NPE exception is thrown as
> below.
> {noformat}
> 017-07-13 14:08:12,476 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
> Error when publishing entity
> [YARN_APPLICATION,application_1499929227397_0001]
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
> at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
> {noformat}
> This is because in RM service creation, active services are created first and
> later ATS services are created. It means active services are started and ATS
> services are started later point of time.
> This gives sufficient time to active services recover the applications which
> tries to publish into ATS while recovering. Since ATS services are not
> started yet, it throws NPE.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]