Sunil G commented on YARN-4401:

Hi [~templedf]
I am not very sure about the use case here. However I feel if such a case 
occurs, we will have enough information from logs to get the app-id.
Then we can use below command to clear such apps if necessary rather than 
forcefully clear from rmcontext.
Usage: yarn resourcemanager [-format-state-store]
                            [-remove-application-from-state-store <appId>]

> A failed app recovery should not prevent the RM from starting
> -------------------------------------------------------------
>                 Key: YARN-4401
>                 URL: https://issues.apache.org/jira/browse/YARN-4401
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4401.001.patch
> There are many different reasons why an app recovery could fail with an 
> exception, causing the RM start to be aborted.  If that happens the RM will 
> fail to start.  Presumably, the reason the RM is trying to do a recovery is 
> that it's the standby trying to fill in for the active.  Failing to come up 
> defeats the purpose of the HA configuration.  Instead of preventing the RM 
> from starting, a failed app recovery should log an error and skip the 
> application.

This message was sent by Atlassian JIRA

Reply via email to