[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070372#comment-14070372
 ] 

Jason Lowe commented on YARN-2331:
----------------------------------

We can distinguish between supervised/unsupervised via a config.  Determining 
whether an unsupervised shutdown is due to a rolling upgrade is a bit trickier. 
 Some of the options there include:

- Add an admin port to NMs and a corresponding CLI command to send commands to 
the port.  There's a lot of boilerplate that goes along with this, but it is 
the most flexible option if we ever want to add other admin commands to an NM.
- Add a REST API to do this (with appropriate authentication to make sure not 
just anyone can cause an NM shutdown)
- Use another signal handler to indicate the shutdown just like the SIGTERM 
handler today for a normal shutdown but for another signal like SIGINT.   The 
shell scripts could have a new command that would perform the rolling upgrade 
shutdown with the new signal rather than SIGTERM.  This would be relatively 
simple to implement on POSIX platforms like Linux but has portability 
ramifications for non-POSIX platforms like Windows.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> ------------------------------------------------------------------------
>
>                 Key: YARN-2331
>                 URL: https://issues.apache.org/jira/browse/YARN-2331
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to