[
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084766#comment-16084766
]
Jason Lowe commented on YARN-6798:
----------------------------------
IMHO we should only need to bump the major version if any of the following are
true:
* Older NM software will explode when it tries to recover the state store
* Older NM software fails to do something crucial during recovery due to
ignoring something in the state store
otherwise we can keep the major version the same and simply bump the minor
version. It looks like the two features added to the state store in a way
where we can remain on 1.x, but I haven't dug into it deeply to be sure.
bq. This will be incompatible the previous alphas and anyone running directly
from branch-2 builds.
True, but that's the risk of running on unreleased software (as is the case
with branch-2). Anyone could check in something that isn't
backwards-compatible that needs to be subsequently fixed, and that could break
users who happened to deploy in-between. AFAIK we don't make any commitments
to compatibility except for official Apache Hadoop releases.
I would argue the same applies to alpha releases. The whole point of calling
it alpha is to convey that APIs may be unstable and could disappear or change
in an incompatible way in the next release. It will be annoying to users who
expect to do a rolling upgrade from 3.0-alphaX, but given the "alpha" tag I
would not expect anyone to have deployed this in a production environment such
that they cannot live with a downtime when upgrading to a subsequent release.
It would be helpful to have a release note that calls out the incompatibility
with 3.0-alpha releases and that users who are upgrading from one of those
releases will need to erase the NM state store on each node before upgrading.
> NM startup failure with old state store due to version mismatch
> ---------------------------------------------------------------
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.0-alpha4
> Reporter: Ray Chiang
> Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO =
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO =
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException:
> Incompatible version for NM state: expecting NM state version 3.0, but
> loading version 2.0
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting
> NM state version 3.0, but loading version 2.0
> at
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> ************************************************************/
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]