[
https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508855#comment-14508855
]
gu-chi commented on YARN-3536:
------------------------------
2015-04-21 04:22:33,923 | INFO | main-EventThread | Recovering app:
application_1429597538411_0001 with 2 attempts and final state = FINISHED |
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:700)
2015-04-21 04:22:33,923 | INFO | main-EventThread | Recovering attempt:
appattempt_1429597538411_0001_000001 with final state: FAILED |
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:734)
2015-04-21 04:22:33,924 | INFO | main-EventThread | Recovering attempt:
appattempt_1429597538411_0001_000002 with final state: null |
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:734)
2015-04-21 04:22:33,924 | INFO | main-EventThread | Create AMRMToken for
ApplicationAttempt: appattempt_1429597538411_0001_000002 |
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.createAndGetAMRMToken(AMRMTokenSecretManager.java:195)
2015-04-21 04:22:33,924 | INFO | main-EventThread | Creating password for
appattempt_1429597538411_0001_000002 |
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.createPassword(AMRMTokenSecretManager.java:307)
2015-04-21 04:22:33,924 | INFO | main-EventThread |
appattempt_1429597538411_0001_000001 State change from NEW to FAILED |
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:704)
2015-04-21 04:22:33,925 | INFO | main-EventThread | Registering app attempt :
appattempt_1429597538411_0001_000002 |
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerAppAttempt(ApplicationMasterService.java:656)
2015-04-21 04:22:33,925 | ERROR | main-EventThread | Failed to load/recover
state |
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:533)
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:607)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:941)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:97)
> ZK exception occur when updating AppAttempt status, then NPE thrown when RM
> do recover
> --------------------------------------------------------------------------------------
>
> Key: YARN-3536
> URL: https://issues.apache.org/jira/browse/YARN-3536
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler, resourcemanager
> Affects Versions: 2.4.1
> Reporter: gu-chi
>
> Here is a scenario that Application status is FAILED/FINISHED but AppAttempt
> status is null, this cause NPE when doing recover with
> yarn.resourcemanager.work-preserving-recovery.enabled set to true
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)