[jira] [Commented] (YARN-128) Resurrect RM Restart

Vinod Kumar Vavilapalli (JIRA) Mon, 24 Sep 2012 12:40:11 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462034#comment-13462034
 ]


Vinod Kumar Vavilapalli commented on YARN-128:
----------------------------------------------

Pasting notes from Bikas inline for easier discussion.

h4.Basic Idea:
Key idea is that the state of the cluster is its current state. So don't save 
all container info.
RM on startup sets a recovery flag on. Informs scheduler via API.
Re-create running AM info from persisted state. Running AM's will heartbeat to 
the RM and be asked to re-sync.
Re-start AM's that have been lost. What about AM's that completed during 
restart. Re-running them should be a no-op.
Ask running and re-started AM's to re-send all pending container requests to 
re-create pending request state.
RM accepts new AM registrations and their requests.
Scheduling pass is not performed when recovery flag is on.
RM waits for nodes to heartbeat and give it container info.
RM passes container info to scheduler so that the scheduler can re-create 
current allocation state.
After recovery time threshold, reset recovery flag and start the scheduling 
pass. Normal from thereon.
Schedulers could save their state and recover previous allocation information 
from that saved state.

h4.What info comes in node heartbeats:
Handle sequence number mismatch during recovery. On heartbeat from node send 
ReRegister command instead of Reboot. NodeManager should continue running 
containers during this time.
RM sends commands back to clean up containers/applications. Can orphans be left 
behind on nodes after RM restart? Will NM be able to auto-clean containers?
ApplicationAttemptId can be gotten from Container objects to map resources back 
to SchedulingApp.

h4.How to pause scheduling pass:
Scheduling pass is triggered on NODE_UPDATE events that happen on node 
heartbeat. Easy to pause under recovery flag.
YarnScheduler.allocate() is the API that needs to be changed.
How to handle container releases messages that were lost when RM was down? Will 
AM's get delivery failure and continue to resend indefinitely?

h4.How to re-create scheduler allocation state:
On node re-register, RM passes container info to scheduler so that the 
scheduler can re-create current allocation state.
Use CsQueue.recoverContainer() to recover previous allocations from currently 
running containers.

h4.How to re-synchronize pending requests with AM's:
Need new AM-RM API to resend asks from AM to RM.
Keep accumulating asks from AM's like it currently happens when allocate() is 
called.

h4.How to persist AM state:
Store AM info in a persistent ZK node that uses version numbers to prevent out 
of order updates from other RM's. One ZK node per AM under a master RM ZK node. 
AM submission creates ZK node. Start and restart update ZK node. Completion 
clears ZK node.

h4.Metrics:
What needs to be done to maintain consistency across restarts. New app attempt 
would be a new attempt but what about recovered running apps.

h4.Security:
What information about keys and tokens to persist across restart so that 
existing secure containers continue to run with new RM and new containers. ZK 
nodes themelves should be secure.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-128) Resurrect RM Restart

Reply via email to