[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803230#comment-14803230
]
Parvez commented on YARN-914:
-----------------------------
Hi,
I am facing issues when trying to resize the AWS EMR cluster which is
configured with Hadoop 2.6.0
Resizing works fine, but when decommissioning a node which has containers
running in it, the entire emr cluster stops functioning. On a resize request,
the EMR terminates a Task Node (EC2 instance ) randomly, without checking if it
has containers running in it or not.
Here YARN should perform moving the containers and the job from one node to
another, which it isnt doing I suppose .
Could it be related to the issue listed here ?
Please answer. Thank you.
> (Umbrella) Support graceful decommission of nodemanager
> -------------------------------------------------------
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Luke Lu
> Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf,
> Gracefully Decommission of NodeManager (v2).pdf,
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.),
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to
> be rescheduled on other NMs. Further more, for finished map tasks, if their
> map output are not fetched by the reducers of the job, these map tasks will
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a
> node manager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)