Junping Du commented on YARN-914:

bq. I do agree with Vinod that there should minimally be an easy way, CLI or 
otherwise, for outside scripts driving the decommission to either force it or 
wait for it to complete. If waiting, there also needs to be a way to either 
have the wait have a timeout which will force after that point or another 
method with which to easily kill the containers still on that node.
Make sense. Sounds like most of us here make agreement on to go with 2nd 
approach proposed by Ming and refined by Vinod.

> Support graceful decommission of nodemanager
> --------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.

This message was sent by Atlassian JIRA

Reply via email to