[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289286#comment-14289286
]
Jason Lowe commented on YARN-914:
---------------------------------
bq. The first step I was thinking to keep NM running in a low resource mode
after graceful decommissioned
I think it could be useful to leave the NM process up after the graceful
decommission completes. That allows automated decommissioning tools to know
the process completed by querying the NM directly. If the NM exits then the
tool may have difficulty distinguishing between the NM crashing just before
decommisioning completed vs. successful completion. The RM will be tracking
this state as well, so it may not be critical to do it one way or the other if
the tool is querying the RM rather than the NM directly.
bq. However, I am not sure if they can handle state migration to new node ahead
of predictable node lost here, or be stateless more or less make more sense
here?
I agree with Ming that it would be nice if the graceful decommission process
could give the AMs a "heads up" about what's going on. The simplest way to
accomplish that is to leverage the already existing preemption framework to
tell the AM that YARN is about to take the resources away. The
StrictPreemptionContract portion of the PreemptionMessage can be used to list
exact resources that YARN will be reclaiming and give the AM a chance to react
to that before the containers are reclaimed. It's then up to the AM if it
wants to do anything special or just let the containers get killed after a
timeout.
bq. These notification may still be necessary, so AM won't add these nodes into
blacklist if container get killed afterwards. Thoughts?
I thought we could leverage the updated nodes list of the AllocateResponse to
let AMs know when nodes are entering the decommissioning state or at least when
the decommission state completes (and containers are killed). Although if the
AM adds the node to the blacklist, that's not such a bad thing either since the
RM should never allocate new containers on a decommissioning node anyway.
> Support graceful decommission of nodemanager
> --------------------------------------------
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Luke Lu
> Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.),
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to
> be rescheduled on other NMs. Further more, for finished map tasks, if their
> map output are not fetched by the reducers of the job, these map tasks will
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a
> node manager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)