[
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995328#comment-13995328
]
Anubhav Dhoot commented on YARN-556:
------------------------------------
bq. clustertimestamp is added to containerId so that containerId after RM
restart do not clash with containerId before (as the containerId counter resets
to zero in memory).
The problem is the containerId currently is composed of ApplicationAttemptId +
int. The int part comes from a in memory containerIdCounter from
AppSchedulingInfo. This gets reset after a RM restart. Without any changes the
containerIds for containers allocated after restart would clash with existing
containerIds.
The prototype proposal is to make it ApplicationAttemptId + uniqueid + int
where the uniqueid can be a timestamp set by RM. I feel containerId should be
an opaque string that YARN app developers don't take a dependency on. Also if
we used protobuf serialization/deserialization rules everywhere we could deal
with compatibility changes of different YARN code versions.
> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf,
> WorkPreservingRestartPrototype.001.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical
> information. This umbrella jira will track changes needed to recover the
> running state of the cluster so that work can be preserved across RM restarts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)