[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074635#comment-14074635
 ] 

Anubhav Dhoot commented on YARN-2229:
-------------------------------------

We cannot simply add a field and have old code not know about it. That will 
cause it to silently work with a wrong id (missing field). And because of the 
way we construct containerIds we need to add the new field (details in 
YARN-2052).

The only way i see it working (without a cluster shutdown) is if we support 
deserializing both the older format and newer format. When serializing we can 
choose to emit a new field based on a condition (flag or version number of the 
daemon).
So the first rolling upgrade will not turn on the condition but will ensure all 
the code supports deserializing the newer field if it exists. In the next 
rolling upgrade we can turn on the condition to serialize the new field.

RM can ensure that  NMs are upgraded to a specific version (support 
deserializing the new field) before allowing the flag to be turned on. That 
will take care of the case when someone does not follow the approach above.
Any problems with this approach?

> ContainerId can overflow with RM restart
> ----------------------------------------
>
>                 Key: YARN-2229
>                 URL: https://issues.apache.org/jira/browse/YARN-2229
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
> YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
> YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
> YARN-2229.8.patch, YARN-2229.9.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to