[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

Bikas Saha (JIRA) Mon, 12 May 2014 17:59:25 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995829#comment-13995829
 ]


Bikas Saha commented on YARN-556:
---------------------------------

bq. After the configurable wait-time, the RM starts accepting RPCs from both 
new AMs and already existing AMs.
This is not needed. The AM can be allowed to re-sync after state is recovered 
from the store. Allocations to the AM may not occur until the threshold 
elapses. In fact, we want to re-sync the AM's asap so that they dont give up on 
the RM.

bq. Existing AMs are expected to resync with the RM, which essentially 
translates to register followed by an allocate call
We should keep the option open to use a new API called resync that does exactly 
that. It may help to make this operation "atomic"





> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

Reply via email to