Chang Li updated YARN-4334:
    Attachment: YARN-4334.wip.patch

upload a prototype patch, which does heartbeat to LeveldbRMStateStore and on RM 
recovery it checks whether statestore is expired

> Ability to avoid ResourceManager recovery if state store is "too old"
> ---------------------------------------------------------------------
>                 Key: YARN-4334
>                 URL: https://issues.apache.org/jira/browse/YARN-4334
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Jason Lowe
>            Assignee: Chang Li
>         Attachments: YARN-4334.wip.patch
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.

This message was sent by Atlassian JIRA

Reply via email to