[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995328#comment-13995328 ]
Anubhav Dhoot commented on YARN-556: ------------------------------------ bq. clustertimestamp is added to containerId so that containerId after RM restart do not clash with containerId before (as the containerId counter resets to zero in memory). The problem is the containerId currently is composed of ApplicationAttemptId + int. The int part comes from a in memory containerIdCounter from AppSchedulingInfo. This gets reset after a RM restart. Without any changes the containerIds for containers allocated after restart would clash with existing containerIds. The prototype proposal is to make it ApplicationAttemptId + uniqueid + int where the uniqueid can be a timestamp set by RM. I feel containerId should be an opaque string that YARN app developers don't take a dependency on. Also if we used protobuf serialization/deserialization rules everywhere we could deal with compatibility changes of different YARN code versions. > RM Restart phase 2 - Work preserving restart > -------------------------------------------- > > Key: YARN-556 > URL: https://issues.apache.org/jira/browse/YARN-556 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Bikas Saha > Attachments: Work Preserving RM Restart.pdf, > WorkPreservingRestartPrototype.001.patch > > > YARN-128 covered storing the state needed for the RM to recover critical > information. This umbrella jira will track changes needed to recover the > running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)