[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256106#comment-16256106 ] Junping Du commented on YARN-914: - Client side graceful decommission work has been done together with proper document, so we should claim part of goal is achieved. I think we should separate server side decommission work into phase two with fixing HA issues, Jason Format issues, and other enhancements, which is helpful to make list cleaner. If nobody against, I will create a new Umbrella jira (and new branch) for moving all open JIRAs to under that one. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135159#comment-15135159 ] Daniel Zhi commented on YARN-914: - Lack of a better title, the sub-JIRA is currently named as: "Automatic and Asynchronous Decommissioning Nodes Status Tracking". > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135138#comment-15135138 ] Daniel Zhi commented on YARN-914: - I have applied and merged my code changes on top of latest Hadoop trunk branch (3.0.0-SNAPSHOT), launched cluster and verified graceful decommission works as expected. Per suggestion, I created a sub-JIRA with a doc that describes the design and the patch on top of latest trunk. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066518#comment-15066518 ] Junping Du commented on YARN-914: - Hi [~danzhi], thanks for sharing the information above and welcome to join the contribution to Apache Hadoop. bq. Our implementation is much in sync with the architecture and idea in the JIRA design document. Good to hear that we are on the same page. One thing we need to pay attention is: we already have many patches committed into trunk/branch-2.8. As an continuous developing effort on YARN, we need to remove the code (current internal to yourself) for similar functionality or APIs before contributing or it would take reviewer/committer more effort to differentiate which functionalities/APIs are duplicated and which are not - that usually take much longer time. bq. On the other hand, there are additional details and component level designs that the JIRA design document not necessarily discuss or touch. These details naturally surfaced up during the development iterations and the corresponding design became matured and stabilized. I agree that the design document could miss some details of implementation in general. However, we can find more background/details in JIRA discussion or patch implementation. Let me explain below. bq. One example is the DecommissioningNodeWatcher, which embedded in ResourceTrackingService, tracks DECOMMISSIONING nodes status automatically and asynchronously after client/admin made the graceful decommission request. Another example is per node decommission timeout support, which is useful to decommission node that will be terminated soon. Actually, our current design and committed patches already support timeout feature. There are basically two ways to handle timeout: RM side or CLI side, both have pros and cons. Per disussions above (https://issues.apache.org/jira/browse/YARN-914?focusedCommentId=14312677&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14312677 and https://issues.apache.org/jira/browse/YARN-914?focusedCommentId=14312677&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14312677), we (Jason, Vinod and I) all agreed to go with CLI way first and we already implement it in sub JIRA (YARN-3225) and get committed. Of course, we are open for the other way of implementation, but we do want it can be based on a switch on/off configuration that doesn't affect current preferred option that we already implemented. bq. Are you able to share these details in an "augmented" design doc? Agreeing on the design would greatly help with review/commits later. I would prefer the effort to abstract the different implementation for tracking/handling timeout. This doesn't sounds like a overall "augmented" design as prevous saying it "much in sync" with current architecture and design. Also it is more proper to create a sub jira to discuss your ideas and put your document there given we already have a very long discussion here on overall design. bq. As far as implementation goes, it is recommended to create subtasks as you see fit. Note that it is easier to review smaller chunks of code. Also, since you guys have implemented it already, can you comment on how much of the code changes are in frequently updated parts? If not much, it might make sense to develop on a branch and merge it to trunk. I would say most parts of YARN-914 are already get committed or patch available already. It doesn't sounds massive of work for enhancing the timeout tracking/handling here, so a dedicated develop branch sounds unnecessary to me. However, I would prefer to create a sub jira to discuss the idea/scope and take a look at your demo code (with removing the duplicated code/feature that already committed or patch available public) before making any judegement/decision. [~danzhi], the concrete steps I would suggest for now is: 1. Review all JIRA discussions/design doc/implementations under this umbrella JIRA so far, and understand the scope and gap with your current internal implementation. 2. Raise a sub jira to put your ideas/design to highlight different options for discussion. If possible, put a demo patch with removing any similar code or feature on existing patches for better understanding. We can discuss later on how to bring in your patch contribution. Make sense? > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf,
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065119#comment-15065119 ] Karthik Kambatla commented on YARN-914: --- bq. On the other hand, there are additional details and component level designs that the JIRA design document not necessarily discuss or touch. Are you able to share these details in an "augmented" design doc? Agreeing on the design would greatly help with review/commits later. As far as implementation goes, it is recommended to create subtasks as you see fit. Note that it is easier to review smaller chunks of code. Also, since you guys have implemented it already, can you comment on how much of the code changes are in frequently updated parts? If not much, it might make sense to develop on a branch and merge it to trunk. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064737#comment-15064737 ] Daniel Zhi commented on YARN-914: - Thanks. Always commit to trunk first makes lots of sense to me. We would need to port the code to trunk and likely build AMI image with it so to leverage our internal verification tests system. Our implementation is much in sync with the architecture and idea in the JIRA design document. On the other hand, there are additional details and component level designs that the JIRA design document not necessarily discuss or touch. These details naturally surfaced up during the development iterations and the corresponding design became matured and stabilized. One example is the DecommissioningNodeWatcher, which embedded in ResourceTrackingService, tracks DECOMMISSIONING nodes status automatically and asynchronously after client/admin made the graceful decommission request. Another example is per node decommission timeout support, which is useful to decommission node that will be terminated soon. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064054#comment-15064054 ] Jason Lowe commented on YARN-914: - [~danzhi] the patch should be against trunk. We always commit first against trunk and then backport to prior releases in reverse release order (e.g.: trunk->branch-2->branch-2.8->branch-2.7) so we avoid a situation where a feature or fix is in a release but disappears in a subsequently released version. See the [How to Contribute|http://wiki.apache.org/hadoop/HowToContribute] page for more information including details on preparing and naming the patch, etc. Is this implementation inline with the design document on this JIRA or is it using a different approach? > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063033#comment-15063033 ] Daniel Zhi commented on YARN-914: - Yes. (Another related blog: https://aws.amazon.com/blogs/aws/amazon-emr-release-4-1-0-spark-1-5-0-hue-3-7-1-hdfs-encryption-presto-oozie-zeppelin-improved-resizing/) Just clarify my question: my current patch is on top of hadoop 2.7.1. However I see branches "trunk", "branch-2.8", "branch-2.7.2" in git://git.apache.org/hadoop.git. It would require extra preparation to make a patch against these branches and it's unclear to me which branch to prepare the patch against. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063002#comment-15063002 ] Parvez commented on YARN-914: - HI Daniel, Thank you for the reply. Yes AWS released the latest AMI version that supports graceful decommissioning of nodes. I guess you are referring to http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-manage-resize.html#graceful-shrink Sorry don't have much idea about the specific branch. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063000#comment-15063000 ] Parvez commented on YARN-914: - HI Daniel, Thank you for the reply. Yes AWS released the latest AMI version that supports graceful decommissioning of nodes. I guess you are referring to http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-manage-resize.html#graceful-shrink Sorry don't have much idea about the specific branch. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062997#comment-15062997 ] Daniel Zhi commented on YARN-914: - Maybe you are already aware of this: EMR team has implemented graceful decommission in recent AMIs (for example, AMI 3.10.0 or 4.2.0). In these new AMIs, when you re-size down the cluster, the control logic will select best candidates and graceful decommission them instead of terminate them right away as before. You can move on to use AMI 3.10.0 (which is hadoop 2.6.0). > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062988#comment-15062988 ] Daniel Zhi commented on YARN-914: - AWS EMR (Elastic Map Reduce) implemented graceful decommission of YARN nodes and included it in several most recent AMI releases The implementation has been verified in thousands of customer clusters. We like to contribute the implementation back to Apache hadoop. Internally we have the code in both hadoop 2.6.0 and hadoop 2.7.1. To prepare for release back to Apache hadoop, which branch should we prepare the code against? > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803230#comment-14803230 ] Parvez commented on YARN-914: - Hi, I am facing issues when trying to resize the AWS EMR cluster which is configured with Hadoop 2.6.0 Resizing works fine, but when decommissioning a node which has containers running in it, the entire emr cluster stops functioning. On a resize request, the EMR terminates a Task Node (EC2 instance ) randomly, without checking if it has containers running in it or not. Here YARN should perform moving the containers and the job from one node to another, which it isnt doing I suppose . Could it be related to the issue listed here ? Please answer. Thank you. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367873#comment-14367873 ] Junping Du commented on YARN-914: - Hi, can someone in watch list help to review patch in sub JIRA YARN-3212? Thanks! > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)