[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933639#comment-13933639
 ] 

Jian He commented on YARN-1815:
-------------------------------

Thanks Karthik for the patch.
For now, it should be fine to move UMA to Failed state as UMA is not saving the 
final state and RM restart doesn’t support UMA. The core change looks good.

Test case:  we need a more thorough test case to test UMA is moved to Failed 
state after RM restarts using two MockRMs like the ones in TestRMRestart. The 
bigger problem is that if Unmanged application is not added back to the 
completedApps in RMAppManager after RM restart via the FinalTransition, it'll 
never be removed from state store. We remove the applications from state store 
when completedApps in RMAppManager go beyond the max-app-limit.

> RM should recover only Managed AMs
> ----------------------------------
>
>                 Key: YARN-1815
>                 URL: https://issues.apache.org/jira/browse/YARN-1815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to