Ming Ma commented on YARN-914:

I agree with Jason. It is easier if NM doesn't need to know about decommission. 
There is a scalability issue that Junping might have brought up; but it 
shouldn't be an issue.

To clarify decomm node list, it appears there are two things, one is the decomm 
request list; another one is the run time state of the decomm nodes. From 
Xuan's comment it appears we want to put the request in HDFS and leverage 
FileSystemBasedConfigurationProvider to read it at run time. Given it is 
considered configuration, that seems a good fit. Jason mentioned the state 
store , that can be used to track the run time state of the decomm. This is 
necessary given we plan to introduce timeout for graceful decommission. 
However, if we assume ResouceOption's overcommitTimeout state is stored in 
state store for RM failover case as part YARN-291, then the new active RM can 
just replay the state transition. If so, it seems we don't need to persist 
decomm run time state to state store.

Alternatively we can remove graceful decommission timeout for YARN layer and 
let external decommission script handle that. If the script considers the 
graceful decommission takes too long, it can ask YARN to do the immediate 

BTW, it appears fair scheduler doesn't support ConfigurationProvider.

Recommission is another scenario. It can happen when node is in decommissioned 
state or decommissioned_in_progress state.

> Support graceful decommission of nodemanager
> --------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.

This message was sent by Atlassian JIRA

Reply via email to