[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

Anubhav Dhoot (JIRA) Mon, 12 May 2014 11:56:29 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995328#comment-13995328
 ]


Anubhav Dhoot commented on YARN-556:
------------------------------------

bq. clustertimestamp is added to containerId so that containerId after RM 
restart do not clash with containerId before (as the containerId counter resets 
to zero in memory). 

The problem is the containerId currently is composed of  ApplicationAttemptId + 
int. The int part comes from a in memory containerIdCounter from 
AppSchedulingInfo. This gets reset after a RM restart. Without any changes the 
containerIds for containers allocated after restart would clash with existing 
containerIds. 
The prototype proposal is to make it ApplicationAttemptId + uniqueid + int 
where the uniqueid can be a timestamp set by RM. I feel containerId should be 
an opaque string that YARN app developers don't take a dependency on. Also if 
we used protobuf serialization/deserialization rules everywhere we could deal 
with compatibility changes of different YARN code versions. 

> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

Reply via email to