[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2017-11-15 Thread Karthik Palaniappan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253971#comment-16253971
 ] 

Karthik Palaniappan commented on YARN-2331:
---

Ah, thanks for the quick response.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: BB2015-05-RFC
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2017-11-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253956#comment-16253956
 ] 

Jason Lowe commented on YARN-2331:
--

There is documentation of the property at 
https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
 but I agree it could be better.

bq. Is there a reason that you need to distinguish between a supervised NM 
shutdown and a rolling upgrade related shutdown?

Yes, in the sense that the two shutdowns may be different depending upon how 
the rolling upgrade shutdown was performed.  For example, in our clusters we do 
not have direct supervision on the nodemanagers and instead have another tool 
that periodically comes along and services nodes that have fallen out of the 
cluster.  That means the nodemanager will not necessarily be restarted in a 
timely manner if it crashes.  In that case we want the nodemanager to shutdown 
cleanly during the crash, killing all running containers since otherwise they 
will be unsupervised and the RM will believe the containers are dead due to 
lack of NM heartbeats from this node.  If the NM were under direct supervision 
then it will be restarted quickly after it crashes.  In that scenario we would 
_not_ want it to kill the containers and instead let the NM recover the 
containers upon restart.

For rolling upgrades we kill the nodemanager with SIGKILL, preventing it from 
doing any cleanup processing.  Then we restart the nodemanagers on the new 
software, and the nodemanager recovers the containers on startup.  In our 
clusters the work preserving and supervised properties are set differently so 
the NM knows to support recovery yet still kill containers on shutdown.  Before 
this change the NM would always kill containers on a shutdown, so it would be 
impossible to preserve work in the case where the NM threw an exception and 
performed an orderly shutdown yet the NM was under supervision.

In 2.8 and later the nodemanager restart documentation moved to a unified 
nodemanager page, e.g.: 
https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-site/NodeManager.html,
 but it still doesn't describe this property.  I filed YARN-7502 to update the 
nodemanager restart docs to cover this property and when it would be useful.


> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: BB2015-05-RFC
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2017-11-14 Thread Karthik Palaniappan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252784#comment-16252784
 ] 

Karthik Palaniappan commented on YARN-2331:
---

Toggling this new configuration property (yarn.nodemanager.recovery.supervised) 
isn't very different than just toggling the property that enables recovery 
(yarn.nodemanager.recovery.enabled). It's surprising that you now need to flip 
two properties to get NM work preservation to work.

Is there a reason that you need to distinguish between a supervised NM shutdown 
and a rolling upgrade related shutdown?

I'm complaining because the instructions in the 2.7 line are incorrect in 2.8: 
https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html.
 Equivalent docs don't exist in the 2.8 line (i.e. if you change the url to be 
r2.8.2), so I couldn't find any documentation of this new property.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: BB2015-05-RFC
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2015-05-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535257#comment-14535257
 ] 

Xuan Gong commented on YARN-2331:
-

Thanks for explanation. [~jlowe] That makes sense to me.
The patch LGTM.  Kick the jenkins

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-RFC
 Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2015-05-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533345#comment-14533345
 ] 

Xuan Gong commented on YARN-2331:
-

Cancel the patch since it does not apply anymore

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2015-05-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533359#comment-14533359
 ] 

Xuan Gong commented on YARN-2331:
-

[~jlowe] Could you rebase the patch, please ?

Probably, we could set the default value for 
yarn.nodemanager.recovery.supervised as true. Normally, when people add a node 
as NM, they expect to use this node for a long time. So, restart is expected ?

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2015-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525627#comment-14525627
 ] 

Hadoop QA commented on YARN-2331:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   2m 58s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e8d0ee5 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7666/console |


This message was automatically generated.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163633#comment-14163633
 ] 

Jason Lowe commented on YARN-2331:
--

Yes, the patch implements the [Another possible approach 
comment|https://issues.apache.org/jira/browse/YARN-2331?focusedCommentId=14070925page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070925].
  This is how we're currently managing NM restarts, as we don't run our NMs 
under supervision.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162117#comment-14162117
 ] 

Hadoop QA commented on YARN-2331:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673377/YARN-2331.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery
  
org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5308//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5308//console

This message is automatically generated.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162414#comment-14162414
 ] 

Hadoop QA commented on YARN-2331:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5312//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5312//console

This message is automatically generated.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162917#comment-14162917
 ] 

Junping Du commented on YARN-2331:
--

Thanks [~jlowe] for the patch. One thing I want to confirm here is: after this 
patch, if we setting yarn.nodemanager.recovery.enabled to true but setting 
yarn.nodemanager.recovery.supervised to false, we can still keep container 
running if we kill NM daemon by kill -9 but go through yarn-daemon.sh stop 
nodemanager will kill running containers. Isn't it?

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-08-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096176#comment-14096176
 ] 

Jason Lowe commented on YARN-2331:
--

Ideally the process is self-contained on the NM node so once it has shutdown 
without killing containers it can be immediately restarted on the new release 
to minimize the period where the NM is not responding.  I suppose we could have 
the the shutdown/upgrade script on the NM issue the rmadmin command then wait 
for the NM to receive the RM command and exit.

I think it would be cleaner if we didn't have to involve the RM.  However I 
don't feel so strongly that I'd object if we can't find a nice way to do this 
with just the NM node.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-08-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093655#comment-14093655
 ] 

Junping Du commented on YARN-2331:
--

[~jlowe], for rollup when NM is not supervised, I think another way is to add a 
command line in RM Admin to bring down specific NM without killing containers 
(by notifying RMNode and heartbeat back) given no admin port to NM so far. The 
NM services shutdown (no matter decommission or failed occasionally) without 
supervised won't trigger this CLI so won't preserve running containers. 
Thoughts?

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-07-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070925#comment-14070925
 ] 

Jason Lowe commented on YARN-2331:
--

Another possible approach is to have the NM always try to cleanup containers on 
a shutdown when it is unsupervised.  If a rolling upgrade needs to be performed 
and thus containers need to be preserved, the NM would be killed without the 
chance to cleanup (e.g.: kill -9 to deliver a SIGKILL).  Upon restart the NM 
would recover the state from the state store and reacquire the containers.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)