[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327610#comment-14327610 ] Junping Du commented on YARN-914: - Break down this feature into sub-JIRAs. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324404#comment-14324404 ] Jason Lowe commented on YARN-914: - bq. what's the benefit of step 1 over decommission nodes directly after timeout? We don't have to notify AMs if we want to keep things simpler. However we already support preempting (i.e.: killing) of specific containers via StrictPreemptionContract so it seems straightforward to allow the AMs to be a bit more proactive. Note that we'd still need a timeout to give them time to respond, so the decomm would be two phases, the first where we're simply waiting for containers to complete on their own, and the second where we notify AMs about imminent preemption and give them a little bit of time to react before forcibly killing any remaining containers. The advantage of adding the preemption-with-explicit-grace-period feature is that we don't need two separate timeout phases. Without the feature, telling AMs too early that their containers are going away might make them do something expensive/drastic when the container is going to complete on its own in a few more minutes. Letting them know the deadline explicitly lets them make the call of whether to do anything or let it ride. bq. If there is benefit, why we don't do this today when decommission nodes? Because today's decommission is instantaneous and not graceful, and fixing that is the point of this JIRA. ;-) > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324390#comment-14324390 ] Junping Du commented on YARN-914: - bq. The main point I'm trying to make here is that we shouldn't be worrying too much about long-running services right now. Agree. Especially we were pushing the tracking of timeout out of YARN core in above discussion. The new CLI will track time (configurable per operation) and send force decommission after timeout. We can add notification to AM on NM's decommissioning (and timeout) also which could be more complicated though. bq. In the short-term I think we just go with a configurable decomm timeout and AM notification via strict preemption as the timeout expires. If we want to get a bit fancier, we can annotate the strict preemption with a timeout so the AM knows approximately when the preemption will occur. Ok. My understanding here is we have two steps here: 1. notify AM in strict preemption after timeout; 2. notify AM in flexible preemption with tolerant timeout when start decommissioning. Quick question here is: what's the benefit of step 1 over decommission nodes directly after timeout? If there is benefit, why we don't do this today when decommission nodes? bq. With that feature we would notify AMs as soon as the node is marked for decomm that their containers will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM to decide whether to do anything about it or if their containers on that node will complete within that time naturally. With that setup we don't have to special-case LRS apps or anything like that, as we're telling the apps ASAP the decomm is happening and giving them time to deal with it, LRS or not. Make sense. Sounds like there is a sub JIRA already being created, and we can extend it to have a timeout. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324277#comment-14324277 ] Jason Lowe commented on YARN-914: - bq. I think prediction of expected runtime of containers could be hard in YARN case. However, can we typically say long running service containers are expected to run very long or infinite? If so, notifying AM to preempt containers of LRS make more sense here than waiting here for timeout. Isn't it? The main point I'm trying to make here is that we shouldn't be worrying too much about long-running services right now. YARN doesn't even know which are which yet, and without any kind of container lifespan prediction there's no way to know whether a container will finish within the decomm timeout window or not. YARN knowing which apps are LRS is a primitive form of container lifespan prediction (i.e.: LRS = containers run forever). We will have the same problems with apps that aren't LRS but have containers that can run for a "long" time, where "long" is larger than the decomm timeout. That's why I'm not convinced it makes sense to do anything special for LRS apps vs. other apps. In the short-term I think we just go with a configurable decomm timeout and AM notification via strict preemption as the timeout expires. If we want to get a bit fancier, we can annotate the strict preemption with a timeout so the AM knows approximately _when_ the preemption will occur. With that feature we would notify AMs as soon as the node is marked for decomm that their containers will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM to decide whether to do anything about it or if their containers on that node will complete within that time naturally. With that setup we don't have to special-case LRS apps or anything like that, as we're telling the apps ASAP the decomm is happening and giving them time to deal with it, LRS or not. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323124#comment-14323124 ] Junping Du commented on YARN-914: - Thanks [~jlowe] for review and comments! bq. Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS? Sounds good. Will update it later. bq. We should remove its available (not total) resources from the cluster then continue to remove available resources as containers complete on that node. That's a very good point. Yes. we should update resource in this way. bq. As for the UI changes, initial thought is that decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). Make sense. Will add to proposal and can discuss more details on UI JIRA later. bq. For the NM restart open question, this should no longer an issue now that the NM is unaware of graceful decommission. Right. bq. For the AM dealing with being notified of decommissioning, again I think this should just be treated like a strict preemption for the short term. IMHO all the AM needs to know is that the RM is planning on taking away those containers, and what the AM should do about it is similar whether the reason for removal is preemption or decommissioning. bq. Back to the long running services delaying decommissioning concern, does YARN even know the difference between a long-running container and a "normal" container? I am afraid not now. YARN-1039 should be a start to do the differentiation. bq. If it doesn't, how is it supposed to know a container is not going to complete anytime soon? Even a "normal" container could run for many hours. It seems to me the first thing we would need before worrying about this scenario is the ability for YARN to know/predict the expected runtime of containers. I think prediction of expected runtime of containers could be hard in YARN case. However, can we typically say long running service containers are expected to run very long or infinite? If so, notifying AM to preempt containers of LRS make more sense here than waiting here for timeout. Isn't it? bq. There's still an open question about tracking the timeout RM side instead of NM side. Sounds like the NM side is not going to be pursued at this point, and we're going with no built-in timeout support in YARN for the short-term. That was unclear at the beginning of discussion but much clear now, will remove this part. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316980#comment-14316980 ] Jason Lowe commented on YARN-914: - Thanks for updating the doc, Junping. Additional comments: Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS? The design says when a node starts decommissioning we will remove its resources from the cluster, but that's not really the case, correct? We should remove its available (not total) resources from the cluster then continue to remove available resources as containers complete on that node. Failing to do so will result in weird metrics like more resources running on the cluster than the cluster says it has, etc. Are we only going to support graceful decommission via updates to the include/exclude files and refresh? Not needed for the initial cut, but thinking of a couple of use-cases and curious what others thought: * Would be convenient to have an rmadmin command that does this in one step, especially for a single-node. Arguably if we are persisting cluster nodes in the state store we can migrate the list there, and the include/exclude list simply become convenient ways to batch-update the cluster state. * Will NMs be able to request a graceful decommission via their health check script? There have been some cases in the past where it would have been nice for the NM to request a ramp-down on containers but not instantly kill all of them with an UNHEALTHY report. As for the UI changes, initial thought is that decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). For the NM restart open question, this should no longer an issue now that the NM is unaware of graceful decommission All the RM needs to do is ensure that a node that is rejoining the cluster when the RM thought it was already part of it retains its previous running/decommissioning state. That way if an NM is decommissioning before the restart it will continue to decommission after it restarts. For the AM dealing with being notified of decommissioning, again I think this should just be treated like a strict preemption for the short term. IMHO all the AM needs to know is that the RM is planning on taking away those containers, and what the AM should do about it is similar whether the reason for removal is preemption or decommissioning. Back to the long running services delaying decommissioning concern, does YARN even know the difference between a long-running container and a "normal" container? If it doesn't, how is it supposed to know a container is not going to complete anytime soon? Even a "normal" container could run for many hours. It seems to me the first thing we would need before worrying about this scenario is the ability for YARN to know/predict the expected runtime of containers. There's still an open question about tracking the timeout RM side instead of NM side. Sounds like the NM side is not going to be pursued at this point, and we're going with no built-in timeout support in YARN for the short-term. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316606#comment-14316606 ] Junping Du commented on YARN-914: - bq. I do agree with Vinod that there should minimally be an easy way, CLI or otherwise, for outside scripts driving the decommission to either force it or wait for it to complete. If waiting, there also needs to be a way to either have the wait have a timeout which will force after that point or another method with which to easily kill the containers still on that node. Make sense. Sounds like most of us here make agreement on to go with 2nd approach proposed by Ming and refined by Vinod. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314677#comment-14314677 ] Jason Lowe commented on YARN-914: - bq. However, YARN-2567 is about threshold thing, may be a wrong JIRA number? That's the right JIRA. It's about waiting for a threshold number of nodes to report back in after the RM recovers, and the RM would need to persist the state about the nodes in the cluster to know what percentage of the old nodes have reported back in. As for whether we should just provide hooks vs. making it much more of a turnkey solution, I'd be an advocate for initially seeing what we can do with hooks. Based on what we learn with trying to do decommission with that we can provide feedback into the process of making it a built-in, turnkey solution later. I do agree with Vinod that there should minimally be an easy way, CLI or otherwise, for outside scripts driving the decommission to either force it or wait for it to complete. If waiting, there also needs to be a way to either have the wait have a timeout which will force after that point or another method with which to easily kill the containers still on that node. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314653#comment-14314653 ] Junping Du commented on YARN-914: - Thanks [~vinodkv] for comments! bq. IAC, I think we should also have a CLI command to decommission the node which optionally waits till the decommission succeeds. That sounds pretty good. This new CLI can simply "gracefully" decommission related nodes and wait to timeout to forcefully decommission nodes haven't finished. Comparing with approach of external script proposed by Ming above, this has less dependency on effort that outside of hadoop. bq. Regarding storage of the decommission state, YARN-2567 also plans to make sure that the state of all nodes is maintained up to date on the state-store. That helps with many other cases too. We should combine these efforts. That make sense. However, YARN-2567 is about threshold thing, may be a wrong JIRA number? bq. Regarding long running services, I think it makes sense to let the admin initiating the decommission know - not in terms of policy but as a diagnostic. Other than waiting for a timeout, the admin may not have noticed that a service is running on this node before the decommission is triggered. bq. This is the umbrella concern I have. There are two ways to do this: Let YARN manage the decommission process or manage it on top of YARN. If the later is the approach, I don't see a lot to be done here besides YARN-291. No? Agree that there is less effort for 2nd approach. If so, we still need RM can aware containers/apps get finished then trigger shutdown to NM to make decommission comes earlier (and randomly) which I guess is important to upgrade of large cluster. Isn't it? For YARN-291, my understanding is now we don't rely on any open issues left there because we only need to set NM's resource to 0 at runtime which we already provide there. BTW, I think the approach you just proposed above is "2nd approach + a new CLI". Isn't it? I prefer to go with this way but would like to hear other guys' ideas here also. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312677#comment-14312677 ] Vinod Kumar Vavilapalli commented on YARN-914: -- Is the decommission_timeout a server side config or specifiable for each decommission request? The current refreshNodes approach will not enable a per request config. IAC, I think we should also have a CLI command to decommission the node which optionally waits till the decommission succeeds. Regarding storage of the decommission state, YARN-2567 also plans to make sure that the state of all nodes is maintained up to date on the state-store. That helps with many other cases too. We should combine these efforts. /cc [~jianhe] Regarding long running services, I think it makes sense to let the admin initiating the decommission know - not in terms of policy but as a diagnostic. Other than waiting for a timeout, the admin may not have noticed that a service is running on this node before the decommission is triggered. bq. Alternatively we can remove graceful decommission timeout for YARN layer and let external decommission script handle that. If the script considers the graceful decommission takes too long, it can ask YARN to do the immediate decommission. This is the umbrella concern I have. There are two ways to do this: Let YARN manage the decommission process or manage it on top of YARN. If the later is the approach, I don't see a lot to be done here besides YARN-291. No? > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312525#comment-14312525 ] Junping Du commented on YARN-914: - Thanks for review and comments, [~xgong], [~jlowe] and [~mingma]! bq. I believe this is about the configuration synchronization between multiple RM nodes. Please take a look at https://issues.apache.org/jira/browse/YARN-1666, and https://issues.apache.org/jira/browse/YARN-1611 Thanks for pointing this out. Sounds like we already resolve most of the problem, good to know it. :) bq. Do we really need to handle the "LRS containers" and "short-term containers" differently? There are lots of different cases we need to take care. I think that we can just use the same way to handle both. I haven't think through this yet. IMO, the benefit of this feature is to provide a reasonable time window for running applications get chance to finish before nodes get decommissioned. Given the endless live cycle of LRS containers, I didn't see the benefit to keep LRS containers running until timeout but only delay the decommission process. Or we assume AM can do some react to LRS containers when get notified? May be for the first step, we can do the same thing for LRS and non-LRS containers to keep it simple. But I think we should keep mind open for this. bq. Maybe we need to track the timeout at RM side and NM side. RM can stop NM if the timeout is reached but it does not receive the "decommission complete" from NM. Sounds reasonable given possible broken communication between NM and RM. However, as Jason Lowe proposed below, we can only track in RM side. Thoughts? bq. For transferring knowledge to the standby RM, we could persist the graceful decomm node list to the state store. Yes. Sounds like most of work already be done in YARN-1666 (decommission node list) and YARN-1611 (timeout value) like @Xuan mentioned above. The only left work here is to keep track of start time of each decommissioning node. Isn't it? bq. I agree with Xuan that so far I don't see a need to treat LRS and normal containers separately. Either a container exits before the decommission timeout or it doesn't. Just like we want decommission happen before timeout if all containers and apps are finished, we don't want unnecessary time cost for delay the decommission process. Isn't it? However, it could be the other case if we think the delay can help to LRS application. Anyway, like mentioned above, it should be fine to keep the same behavior for now but I think we may need to keep mind on it. bq. Just to be clear, the NM is already tracking which applications are active on a node and is reporting these to the RM on heartbeats (see NM context and NodeStatusUpdaterImpl appTokenKeepAliveMap). The DecommissionService doesn't need to explicitly track the apps itself as this is already being done. Yes. The diagram not only include the new components but also existing components. Thanks for reminding on this though. bq. As for doing this RM side or NM side, I think it can simplify things if we do this on the RM side. The RM already needs to know about graceful decommission to avoid scheduling new apps/containers on the node. Also the NM is heartbeating active apps back to the RM, so it's easy for the RM to track which apps are still active on a particular node. If the RMNodeImpl state machine sees that it's in the decommissioning state and all apps/containers have completed then it can transition to the decommissioned state. For timeouts the RM can simply set a timer-delivered event to the RMNode when the graceful decommission starts, and the RMNode can act accordingly when the timer event arrives, killing containers etc. Actually I'm not sure the NM needs to know about graceful decommission at all, which IMHO simplifies the design since only one daemon needs to participate and be knowledgeable of the feature. The NM would simply see the process as a reduction in container assignments until eventually containers are killed and the RM tells it that it's decommissioned. That make sense. In addition, I think even RMNode doesn't have to track time themselves (or in worst case, thousands of threads need to access time), and we can have something like DecommssionTimeoutMonitor that derived from AbstractLivelinessMonitor. When detected timeout, it can send out decommssion_timeout event to RMNode to make node shutdown happens. Also, I agree that NM may not necessary to aware of this decommission_in_process. bq. To clarify decomm node list, it appears there are two things, one is the decomm request list; another one is the run time state of the decomm nodes. From Xuan's comment it appears we want to put the request in HDFS and leverage FileSystemBasedConfigurationProvider to read it at run time. Given it is considered configuration, that seems a good fit. Jason mentioned the
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307724#comment-14307724 ] Ming Ma commented on YARN-914: -- I agree with Jason. It is easier if NM doesn't need to know about decommission. There is a scalability issue that Junping might have brought up; but it shouldn't be an issue. To clarify decomm node list, it appears there are two things, one is the decomm request list; another one is the run time state of the decomm nodes. From Xuan's comment it appears we want to put the request in HDFS and leverage FileSystemBasedConfigurationProvider to read it at run time. Given it is considered configuration, that seems a good fit. Jason mentioned the state store , that can be used to track the run time state of the decomm. This is necessary given we plan to introduce timeout for graceful decommission. However, if we assume ResouceOption's overcommitTimeout state is stored in state store for RM failover case as part YARN-291, then the new active RM can just replay the state transition. If so, it seems we don't need to persist decomm run time state to state store. Alternatively we can remove graceful decommission timeout for YARN layer and let external decommission script handle that. If the script considers the graceful decommission takes too long, it can ask YARN to do the immediate decommission. BTW, it appears fair scheduler doesn't support ConfigurationProvider. Recommission is another scenario. It can happen when node is in decommissioned state or decommissioned_in_progress state. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307545#comment-14307545 ] Jason Lowe commented on YARN-914: - For transferring knowledge to the standby RM, we could persist the graceful decomm node list to the state store. I agree with Xuan that so far I don't see a need to treat LRS and normal containers separately. Either a container exits before the decommission timeout or it doesn't. Just to be clear, the NM is already tracking which applications are active on a node and is reporting these to the RM on heartbeats (see NM context and NodeStatusUpdaterImpl appTokenKeepAliveMap). The DecommissionService doesn't need to explicitly track the apps itself as this is already being done. As for doing this RM side or NM side, I think it can simplify things if we do this on the RM side. The RM already needs to know about graceful decommission to avoid scheduling new apps/containers on the node. Also the NM is heartbeating active apps back to the RM, so it's easy for the RM to track which apps are still active on a particular node. If the RMNodeImpl state machine sees that it's in the decommissioning state and all apps/containers have completed then it can transition to the decommissioned state. For timeouts the RM can simply set a timer-delivered event to the RMNode when the graceful decommission starts, and the RMNode can act accordingly when the timer event arrives, killing containers etc. Actually I'm not sure the NM needs to know about graceful decommission at all, which IMHO simplifies the design since only one daemon needs to participate and be knowledgeable of the feature. The NM would simply see the process as a reduction in container assignments until eventually containers are killed and the RM tells it that it's decommissioned. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306602#comment-14306602 ] Xuan Gong commented on YARN-914: Thanks for the proposal [~djp] bq. RM in failed over (with HA enabled) when gracefully decommission is just triggered. We should make sure the new active RM can carry on the action forward (how to keep sync for decommissioned node list between active and standby RM?) I believe this is about the configuration synchronization between multiple RM nodes. Please take a look at https://issues.apache.org/jira/browse/YARN-1666, and https://issues.apache.org/jira/browse/YARN-1611 bq. With containers of long running services, the timeout may not help but only delay the upgrade/reboot process. Shall we skip it and decommission directly in this case? Do we really need to handle the "LRS containers" and "short-term containers" differently? There are lots of different cases we need to take care. I think that we can just use the same way to handle both. bq. Another possibility is to track decommission timeout in RM side, instead of NM side a new decommission services proposed above. Which way is better? Maybe we need to track the timeout at RM side and NM side. RM can stop NM if the timeout is reached but it does not receive the "decommission complete" from NM. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289286#comment-14289286 ] Jason Lowe commented on YARN-914: - bq. The first step I was thinking to keep NM running in a low resource mode after graceful decommissioned I think it could be useful to leave the NM process up after the graceful decommission completes. That allows automated decommissioning tools to know the process completed by querying the NM directly. If the NM exits then the tool may have difficulty distinguishing between the NM crashing just before decommisioning completed vs. successful completion. The RM will be tracking this state as well, so it may not be critical to do it one way or the other if the tool is querying the RM rather than the NM directly. bq. However, I am not sure if they can handle state migration to new node ahead of predictable node lost here, or be stateless more or less make more sense here? I agree with Ming that it would be nice if the graceful decommission process could give the AMs a "heads up" about what's going on. The simplest way to accomplish that is to leverage the already existing preemption framework to tell the AM that YARN is about to take the resources away. The StrictPreemptionContract portion of the PreemptionMessage can be used to list exact resources that YARN will be reclaiming and give the AM a chance to react to that before the containers are reclaimed. It's then up to the AM if it wants to do anything special or just let the containers get killed after a timeout. bq. These notification may still be necessary, so AM won't add these nodes into blacklist if container get killed afterwards. Thoughts? I thought we could leverage the updated nodes list of the AllocateResponse to let AMs know when nodes are entering the decommissioning state or at least when the decommission state completes (and containers are killed). Although if the AM adds the node to the blacklist, that's not such a bad thing either since the RM should never allocate new containers on a decommissioning node anyway. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288644#comment-14288644 ] Junping Du commented on YARN-914: - Sorry for replying late. These are all good points, a couple of comments: bq. Sounds like we need a new state for NM, called "decommission_in_progress" when NM is draining the containers. Agree. We need a dedicated state for NM in this situation and both AM and RM should be aware of it for properly handle it. bq. To clarify my early comment "all its map output are fetched or until all the applications the node touches have completed", the question is when YARN can declare a node's state has been gracefully drained and thus the node gracefully decommissioned ( admins can shutdown the whole machine without any impact on jobs ). For MR, the state could be running tasks/containers or mapper outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes to finish the mappers on the node, another 5 minutes for the job to finish, then YARN can declare the node gracefully decommissioned in 8 minutes, instead of waiting for 30 minutes. RM knows all applications on any given NM. So if all applications on any given node have completed, RM can mark the node "decommissioned". The first step I was thinking to keep NM running in a low resource mode after graceful decommissioned - no running containers, no new containers get spawned, no obviously resources consumption, etc. and just like putting these nodes into maintenance mode. Timeout value there is used to kill unfinished containers to release resources. Not quite sure if we have to terminate NM after timeout but would like to understand your use case here. bq. Yes, I meant long running services. If YARN just kills the containers upon decommission request, the impact could vary. Some services might not have states to drain. Or maybe the services can handle the state migration on their own without YARN's help. For such services, maybe we can just use ResourceOption's timeout for that; set timeout to 0 and NM will just kill the containers. I believe most of these services already take care of losing nodes as each node in YARN cluster cannot be reliable always. However, I am not sure if they can handle state migration to new node ahead of predictable node lost here, or be stateless more or less make more sense here? If we have an example application that could easy migrate a node's state to another, then we can discuss how to provide some rudimentary support here. bq. Given we don't plan to have applications checkpoint and migrate states, it doesn't seem to be necessary to have YARN notify applications upon decommission requests. Just to call it out. These notification may still be necessary, so AM won't add these nodes into blacklist if container get killed afterwards. Thoughts? bq. It might be useful to have a new state called "decommissioned_timeout", so that admins know the node has been gracefully decommissioned or not. Just like my above comments, we can see if we have to terminate the NM. If not, I prefer to use "maintenance" state and Admin can decide if to fully decommission it later. Again, we should talk on your scenarios here. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266692#comment-14266692 ] Ming Ma commented on YARN-914: -- Thanks, Junping. The timeout is definitely necessary. * Sounds like we need a new state for NM, called "decommission_in_progress" when NM is draining the containers. When RM considers the decommission completes, it will be marked "decommissioned". * To clarify my early comment "all its map output are fetched or until all the applications the node touches have completed", the question is when YARN can declare a node's state has been gracefully drained and thus the node gracefully decommissioned ( admins can shutdown the whole machine without any impact on jobs ). For MR, the state could be running tasks/containers or mapper outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes to finish the mappers on the node, another 5 minutes for the job to finish, then YARN can declare the node gracefully decommissioned in 8 minutes, instead of waiting for 30 minutes. RM knows all applications on any given NM. So if all applications on any given node have completed, RM can mark the node "decommissioned". * Yes, I meant long running services. If YARN just kills the containers upon decommission request, the impact could vary. Some services might not have states to drain. Or maybe the services can handle the state migration on their own without YARN's help. For such services, maybe we can just use ResourceOption's timeout for that; set timeout to 0 and NM will just kill the containers. * Given we don't plan to have applications checkpoint and migrate states, it doesn't seem to be necessary to have YARN notify applications upon decommission requests. Just to call it out. * It might be useful to have a new state called "decommissioned_timeout", so that admins know the node has been gracefully decommissioned or not. Thoughts? > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256547#comment-14256547 ] Junping Du commented on YARN-914: - Hi [~mingma], Thanks for comments here. bq. So YARN will reduce the capacity of the nodes as part of the decomission process until all its map output are fetched or until all the applications the node touches have completed? Yes. I am not sure if it is necessary for YARN to mark additional decommissioned on the node as node's resource is already updated to 0, and no container will get chance to be allocated on the node. Auxiliary service should still be running which shouldn't consume much resource if no request of service. bq. In addition, it will be interesting to understand how you handle long running jobs. Do you mean long-running services? First, I think we should support a timeout in drain resources of the node (ResourceOption already has timeout in design). So running containers should be preempted if run out of time. Second, we should support special container tag for the long running services (some discussions in YARN-1039) so we don't have to waste time to wait container finish until timeout. Third, in prospective of operation, we could add long-running label to specific nodes and try not to do decommission on nodes with long-running tag. Let me know if this make sense to you. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254382#comment-14254382 ] Ming Ma commented on YARN-914: -- [~djp], thanks for working on this. It looks like we are going to use YARN-291 and thus the "drain the state" approach, instead of the more complicated "migrate the state" approach. So YARN will reduce the capacity of the nodes as part of the decomission process until all its map output are fetched or until all the applications the node touches have completed? In addition, it will be interesting to understand how you handle long running jobs. FYI, https://issues.apache.org/jira/browse/YARN-1996 will drain containers of unhealthy nodes. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870080#comment-13870080 ] Ming Ma commented on YARN-914: -- Junping/Luke, have you looked into the checkpointing framework being done to support preemption? One possible design to support this scenario could be something like: 1. Drain NM with a timeout. When NM is being drained, no more tasks will be assigned to this node. 2. After the timeout, RM -> AM -> tasks checkpointing will kick in. Task state and application-level state such as map outputs will be preserved; tasks will be rescheduled to other nodes. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817145#comment-13817145 ] Steve Loughran commented on YARN-914: - YARN-1394 adds the need for AMs to be told of NM failure/decommission as causes for container completion > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709255#comment-13709255 ] Aaron T. Myers commented on YARN-914: - Thanks, Luke. > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709144#comment-13709144 ] Luke Lu commented on YARN-914: -- [~atm]: Nice catch! Of course :) > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709067#comment-13709067 ] Aaron T. Myers commented on YARN-914: - Should we perhaps do an s/NN/NM/g in the description of this JIRA? Or does this have something to do with the Name Node and I'm completely missing it? > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NNs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NN is decommissioned, all running containers on the NN need to > be rescheduled on other NNs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira