[
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819059#comment-13819059
]
Jason Lowe commented on YARN-1336:
----------------------------------
It depends upon the nature of the config change or fix. In essence this is no
different than the RM restart use-case today. Any config changes or fixes need
to keep recovery on startup in mind. Most fixes won't be an issue, but
anything that changes the syntax or semantics of the state store data or
recovery process in general will have to deal with the state store format from
a previous version to remain compatible.
Ideally we'd like to be able to support work-preserving rolling upgrades as
well as work-preserving rolling downgrades, so one can smoothly recover from a
spoiled upgrade without taking down the whole cluster. If the persisted state
format isn't changing then this should be straightforward. However if the
state format does change between versions and we end up only supporting a
one-way conversion from the old format to the new format then that would be a
case where we support a work-preserving rolling upgrade but not a
work-preserving rolling downgrade between those versions. A downgrade would
still be possible with the loss of containers, of course, by simply removing
the state store data and restarting.
In summary, we would need to be cognizant of changes that affect state recovery
upon startup so a work-preserving restart can be used to support
work-preserving rolling upgrades. This applies to both the RM and the NM.
> Work-preserving nodemanager restart
> -----------------------------------
>
> Key: YARN-1336
> URL: https://issues.apache.org/jira/browse/YARN-1336
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
>
> This serves as an umbrella ticket for tasks related to work-preserving
> nodemanager restart.
--
This message was sent by Atlassian JIRA
(v6.1#6144)