Ming Ma commented on YARN-914:
I agree with Jason. It is easier if NM doesn't need to know about decommission.
There is a scalability issue that Junping might have brought up; but it
shouldn't be an issue.
To clarify decomm node list, it appears there are two things, one is the decomm
request list; another one is the run time state of the decomm nodes. From
Xuan's comment it appears we want to put the request in HDFS and leverage
FileSystemBasedConfigurationProvider to read it at run time. Given it is
considered configuration, that seems a good fit. Jason mentioned the state
store , that can be used to track the run time state of the decomm. This is
necessary given we plan to introduce timeout for graceful decommission.
However, if we assume ResouceOption's overcommitTimeout state is stored in
state store for RM failover case as part YARN-291, then the new active RM can
just replay the state transition. If so, it seems we don't need to persist
decomm run time state to state store.
Alternatively we can remove graceful decommission timeout for YARN layer and
let external decommission script handle that. If the script considers the
graceful decommission takes too long, it can ask YARN to do the immediate
BTW, it appears fair scheduler doesn't support ConfigurationProvider.
Recommission is another scenario. It can happen when node is in decommissioned
state or decommissioned_in_progress state.
> Support graceful decommission of nodemanager
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Luke Lu
> Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.),
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to
> be rescheduled on other NMs. Further more, for finished map tasks, if their
> map output are not fetched by the reducers of the job, these map tasks will
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a
> node manager.
This message was sent by Atlassian JIRA