[
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995829#comment-13995829
]
Bikas Saha commented on YARN-556:
---------------------------------
bq. After the configurable wait-time, the RM starts accepting RPCs from both
new AMs and already existing AMs.
This is not needed. The AM can be allowed to re-sync after state is recovered
from the store. Allocations to the AM may not occur until the threshold
elapses. In fact, we want to re-sync the AM's asap so that they dont give up on
the RM.
bq. Existing AMs are expected to resync with the RM, which essentially
translates to register followed by an allocate call
We should keep the option open to use a new API called resync that does exactly
that. It may help to make this operation "atomic"
> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf,
> WorkPreservingRestartPrototype.001.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical
> information. This umbrella jira will track changes needed to recover the
> running state of the cluster so that work can be preserved across RM restarts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)