[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314677#comment-14314677
 ] 

Jason Lowe commented on YARN-914:
---------------------------------

bq. However, YARN-2567 is about threshold thing, may be a wrong JIRA number?

That's the right JIRA.  It's about waiting for a threshold number of nodes to 
report back in after the RM recovers, and the RM would need to persist the 
state about the nodes in the cluster to know what percentage of the old nodes 
have reported back in.

As for whether we should just provide hooks vs. making it much more of a 
turnkey solution, I'd be an advocate for initially seeing what we can do with 
hooks.  Based on what we learn with trying to do decommission with that we can 
provide feedback into the process of making it a built-in, turnkey solution 
later.  I do agree with Vinod that there should minimally be an easy way, CLI 
or otherwise, for outside scripts driving the decommission to either force it 
or wait for it to complete.  If waiting, there also needs to be a way to either 
have the wait have a timeout which will force after that point or another 
method with which to easily kill the containers still on that node.

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to