[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253971#comment-16253971 ] Karthik Palaniappan commented on YARN-2331: --- Ah, thanks for the quick response. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-RFC > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253956#comment-16253956 ] Jason Lowe commented on YARN-2331: -- There is documentation of the property at https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, but I agree it could be better. bq. Is there a reason that you need to distinguish between a supervised NM shutdown and a rolling upgrade related shutdown? Yes, in the sense that the two shutdowns may be different depending upon how the rolling upgrade shutdown was performed. For example, in our clusters we do not have direct supervision on the nodemanagers and instead have another tool that periodically comes along and services nodes that have fallen out of the cluster. That means the nodemanager will not necessarily be restarted in a timely manner if it crashes. In that case we want the nodemanager to shutdown cleanly during the crash, killing all running containers since otherwise they will be unsupervised and the RM will believe the containers are dead due to lack of NM heartbeats from this node. If the NM were under direct supervision then it will be restarted quickly after it crashes. In that scenario we would _not_ want it to kill the containers and instead let the NM recover the containers upon restart. For rolling upgrades we kill the nodemanager with SIGKILL, preventing it from doing any cleanup processing. Then we restart the nodemanagers on the new software, and the nodemanager recovers the containers on startup. In our clusters the work preserving and supervised properties are set differently so the NM knows to support recovery yet still kill containers on shutdown. Before this change the NM would always kill containers on a shutdown, so it would be impossible to preserve work in the case where the NM threw an exception and performed an orderly shutdown yet the NM was under supervision. In 2.8 and later the nodemanager restart documentation moved to a unified nodemanager page, e.g.: https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-site/NodeManager.html, but it still doesn't describe this property. I filed YARN-7502 to update the nodemanager restart docs to cover this property and when it would be useful. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-RFC > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252784#comment-16252784 ] Karthik Palaniappan commented on YARN-2331: --- Toggling this new configuration property (yarn.nodemanager.recovery.supervised) isn't very different than just toggling the property that enables recovery (yarn.nodemanager.recovery.enabled). It's surprising that you now need to flip two properties to get NM work preservation to work. Is there a reason that you need to distinguish between a supervised NM shutdown and a rolling upgrade related shutdown? I'm complaining because the instructions in the 2.7 line are incorrect in 2.8: https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html. Equivalent docs don't exist in the 2.8 line (i.e. if you change the url to be r2.8.2), so I couldn't find any documentation of this new property. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-RFC > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535257#comment-14535257 ] Xuan Gong commented on YARN-2331: - Thanks for explanation. [~jlowe] That makes sense to me. The patch LGTM. Kick the jenkins Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-RFC Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533345#comment-14533345 ] Xuan Gong commented on YARN-2331: - Cancel the patch since it does not apply anymore Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533359#comment-14533359 ] Xuan Gong commented on YARN-2331: - [~jlowe] Could you rebase the patch, please ? Probably, we could set the default value for yarn.nodemanager.recovery.supervised as true. Normally, when people add a node as NM, they expect to use this node for a long time. So, restart is expected ? Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525627#comment-14525627 ] Hadoop QA commented on YARN-2331: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 2m 58s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7666/console | This message was automatically generated. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163633#comment-14163633 ] Jason Lowe commented on YARN-2331: -- Yes, the patch implements the [Another possible approach comment|https://issues.apache.org/jira/browse/YARN-2331?focusedCommentId=14070925page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070925]. This is how we're currently managing NM restarts, as we don't run our NMs under supervision. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162117#comment-14162117 ] Hadoop QA commented on YARN-2331: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673377/YARN-2331.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5308//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5308//console This message is automatically generated. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162414#comment-14162414 ] Hadoop QA commented on YARN-2331: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch against trunk revision 9196db9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5312//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5312//console This message is automatically generated. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162917#comment-14162917 ] Junping Du commented on YARN-2331: -- Thanks [~jlowe] for the patch. One thing I want to confirm here is: after this patch, if we setting yarn.nodemanager.recovery.enabled to true but setting yarn.nodemanager.recovery.supervised to false, we can still keep container running if we kill NM daemon by kill -9 but go through yarn-daemon.sh stop nodemanager will kill running containers. Isn't it? Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096176#comment-14096176 ] Jason Lowe commented on YARN-2331: -- Ideally the process is self-contained on the NM node so once it has shutdown without killing containers it can be immediately restarted on the new release to minimize the period where the NM is not responding. I suppose we could have the the shutdown/upgrade script on the NM issue the rmadmin command then wait for the NM to receive the RM command and exit. I think it would be cleaner if we didn't have to involve the RM. However I don't feel so strongly that I'd object if we can't find a nice way to do this with just the NM node. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093655#comment-14093655 ] Junping Du commented on YARN-2331: -- [~jlowe], for rollup when NM is not supervised, I think another way is to add a command line in RM Admin to bring down specific NM without killing containers (by notifying RMNode and heartbeat back) given no admin port to NM so far. The NM services shutdown (no matter decommission or failed occasionally) without supervised won't trigger this CLI so won't preserve running containers. Thoughts? Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070925#comment-14070925 ] Jason Lowe commented on YARN-2331: -- Another possible approach is to have the NM always try to cleanup containers on a shutdown when it is unsupervised. If a rolling upgrade needs to be performed and thus containers need to be preserved, the NM would be killed without the chance to cleanup (e.g.: kill -9 to deliver a SIGKILL). Upon restart the NM would recover the state from the state store and reacquire the containers. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)