Parvez commented on YARN-914:


I am facing issues when trying to resize the AWS EMR cluster which is 
configured with Hadoop 2.6.0

Resizing works fine, but when decommissioning a node which has containers 
running in it, the entire emr cluster stops functioning. On a resize request, 
the EMR terminates a Task Node (EC2 instance ) randomly, without checking if it 
has containers running in it or not. 

Here YARN should perform moving the containers and the job from one node to 
another, which it isnt doing I suppose .

Could it be related to the issue listed here ? 

Please answer. Thank you. 

> (Umbrella) Support graceful decommission of nodemanager
> -------------------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.

This message was sent by Atlassian JIRA

Reply via email to