[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114338#comment-14114338
 ] 

Jian He commented on YARN-2459:
-------------------------------

 Mayank, thanks for working on the issue.  The current change saves the initial 
state, but doesn't store the final state and diagnostics of the app. And RM 
will retry this app if not saving the final state.  I think we should do the 
following as New_saving state is handling it.
{code}
.addTransition(RMAppState.NEW, RMAppState.FINAL_SAVING,
       RMAppEventType.APP_REJECTED,
       new FinalSavingTransition(new AppRejectedTransition(),
           RMAppState.FAILED))
{code}

> RM crashes if App gets rejected for any reason and HA is enabled
> ----------------------------------------------------------------
>
>                 Key: YARN-2459
>                 URL: https://issues.apache.org/jira/browse/YARN-2459
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: YARN-2459-1.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to