Junping Du commented on YARN-1341:

Thanks [~jlowe] for detailed explanation here! I totally agree that we should 
deal with this case by case and appreciate your analysis on above cases. I 
think there are still other cases we should double-check, some of them may 
suffer more from inconsistency.
- Application state - If we failed to store the application update, i.e. from 
init to finish, then we get wrong state on application after recovery. 
- NodeManagerMetrics - The metrics of NM will get mess up if partial updated. 
(We haven't get JIRA to store/recover this. Isn't it?)
On the side effect for bring NM down, like case in deletionServices. I think we 
can just do cleanup on these directories (like we want to do in node 
decommission cases).
About stale tag on NMStateStore - I don't mean to put on NMStateStore, but 
haven't think clearly on where to do - may be we can persistent on local disk 
directly or send to RM and retrieval it in NM registration? 

> Recover NMTokens upon nodemanager restart
> -----------------------------------------
>                 Key: YARN-1341
>                 URL: https://issues.apache.org/jira/browse/YARN-1341
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch

This message was sent by Atlassian JIRA

Reply via email to