[
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074635#comment-14074635
]
Anubhav Dhoot commented on YARN-2229:
-------------------------------------
We cannot simply add a field and have old code not know about it. That will
cause it to silently work with a wrong id (missing field). And because of the
way we construct containerIds we need to add the new field (details in
YARN-2052).
The only way i see it working (without a cluster shutdown) is if we support
deserializing both the older format and newer format. When serializing we can
choose to emit a new field based on a condition (flag or version number of the
daemon).
So the first rolling upgrade will not turn on the condition but will ensure all
the code supports deserializing the newer field if it exists. In the next
rolling upgrade we can turn on the condition to serialize the new field.
RM can ensure that NMs are upgraded to a specific version (support
deserializing the new field) before allowing the flag to be turned on. That
will take care of the case when someone does not follow the approach above.
Any problems with this approach?
> ContainerId can overflow with RM restart
> ----------------------------------------
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Tsuyoshi OZAWA
> Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.10.patch,
> YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch,
> YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch,
> YARN-2229.8.patch, YARN-2229.9.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch,
> lower 22 bits are for sequence number of Ids. This is for preserving
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}},
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define
> the new format of container Id with preserving backward compatibility on this
> JIRA.
--
This message was sent by Atlassian JIRA
(v6.2#6252)