[
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069443#comment-15069443
]
Rohith Sharma K S commented on YARN-4497:
-----------------------------------------
Thinking when it can happen attempt1 is stored , attempt2 is not stored and
attempt3 is stored? One way is manually delete the attempt2 node from
zookeeper.
> RM might fail to restart when recovering apps whose attempts are missing
> ------------------------------------------------------------------------
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Jun Gong
> Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing
> attempts in RMStateStore, for the case storing attempt1, attempt2 and
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one
> by one, for this case, we will recover attmept1, then attempt2. When
> recovering attempt2, we call
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find
> its ApplicationAttemptStateData, but it could not find it, an error will come
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)