Ming Ma commented on YARN-914:

[~djp], thanks for working on this.

It looks like we are going to use YARN-291 and thus the "drain the state" 
approach, instead of the more complicated "migrate the state" approach. So YARN 
will reduce the capacity of the nodes as part of the decomission process until 
all its map output are fetched or until all the applications the node touches 
have completed? In addition, it will be interesting to understand how you 
handle long running jobs.

FYI, https://issues.apache.org/jira/browse/YARN-1996 will drain containers of 
unhealthy nodes.

> Support graceful decommission of nodemanager
> --------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.

This message was sent by Atlassian JIRA

Reply via email to