[
https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484295#comment-15484295
]
Jason Lowe commented on YARN-5630:
----------------------------------
I'm not a fan of the "prepare for rollback" approach if we can avoid it. It
adds another user-visible phase to the rollback procedure and places the burden
on admins, requiring them to either know what keys are valid/appropriate to
specify for the command or that they need to run a special script which embeds
this knowledge. Also simply removing the keys from the database is not going
to be a proper downgrade procedure. Those keys represent state that is
important to preserve on a restart, and if we ignore it then we are dropping a
user request for a container. That's not going to be OK in the general case,
as that may prevent a container from launching properly or having the proper
properties when it is launched. Depending upon the nature of the feature that
added the new store keys, we may not be able to support the downgrade at all
short of failing the container because we can't execute it as requested.
In the short term I think we should commit something similar to this patch to
unblock the 2.8 release. IMHO we should be OK if we support downgrades from
2.8 to 2.7 if the user does not leverage the new features in 2.8 (i.e.:
container increase/decrease, queuing, etc.). Once those features are used then
a downgrade may not work. This mirrors what was done for the epoch number in
container IDs between 2.5 and 2.6. Downgrades worked as long as the new
work-preserving RM restart wasn't performed after upgrading to 2.6. In general
if we are careful only to use new store keys when they are absolutely necessary
then we can support rollbacks as long as users don't use the new features added
in the new release.
After unblocking 2.8 we can then work on the data-driven key ignoring in
YARN-5547. That will help cover another set of features where a simple delete
of the keys is sufficient to perform the downgrade. That would then leave the
features where we can't just ignore keys, and we'll have to come up with some
other approach or state to users that downgrades do not necessarily work once
that new feature is being used.
> NM fails to start after downgrade from 2.8 to 2.7
> -------------------------------------------------
>
> Key: YARN-5630
> URL: https://issues.apache.org/jira/browse/YARN-5630
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Blocker
> Attachments: YARN-5630.001.patch, YARN-5630.002.patch
>
>
> A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an
> unrecognized "version" container key on startup. This breaks downgrades from
> 2.8 to 2.7.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]