[
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039206#comment-14039206
]
Junping Du commented on YARN-1341:
----------------------------------
Thanks [~jlowe] for detailed explanation here! I totally agree that we should
deal with this case by case and appreciate your analysis on above cases. I
think there are still other cases we should double-check, some of them may
suffer more from inconsistency.
- Application state - If we failed to store the application update, i.e. from
init to finish, then we get wrong state on application after recovery.
- NodeManagerMetrics - The metrics of NM will get mess up if partial updated.
(We haven't get JIRA to store/recover this. Isn't it?)
On the side effect for bring NM down, like case in deletionServices. I think we
can just do cleanup on these directories (like we want to do in node
decommission cases).
About stale tag on NMStateStore - I don't mean to put on NMStateStore, but
haven't think clearly on where to do - may be we can persistent on local disk
directly or send to RM and retrieval it in NM registration?
> Recover NMTokens upon nodemanager restart
> -----------------------------------------
>
> Key: YARN-1341
> URL: https://issues.apache.org/jira/browse/YARN-1341
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch,
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)