Jason Lowe commented on YARN-914:

bq.  what's the benefit of step 1 over decommission nodes directly after 

We don't have to notify AMs if we want to keep things simpler.  However we 
already support preempting (i.e.: killing) of specific containers via 
StrictPreemptionContract so it seems  straightforward to allow the AMs to be a 
bit more proactive.  Note that we'd still need a timeout to give them time to 
respond, so the decomm would be two phases, the first where we're simply 
waiting for containers to complete on their own, and the second where we notify 
AMs about imminent preemption and give them a little bit of time to react 
before forcibly killing any remaining containers.  The advantage of adding the 
preemption-with-explicit-grace-period feature is that we don't need two 
separate timeout phases.  Without the feature, telling AMs too early that their 
containers are going away might make them do something expensive/drastic when 
the container is going to complete on its own in a few more minutes.  Letting 
them know the deadline explicitly lets them make the call of whether to do 
anything or let it ride.

bq.  If there is benefit, why we don't do this today when decommission nodes?

Because today's decommission is instantaneous and not graceful, and fixing that 
is the point of this JIRA. ;-)

> Support graceful decommission of nodemanager
> --------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.

This message was sent by Atlassian JIRA

Reply via email to