[jira] [Comment Edited] (YARN-5366) Add support for toggling the removal of completed and failed docker containers

Shane Kumpf (JIRA) Mon, 03 Apr 2017 10:27:12 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953872#comment-15953872
 ]


Shane Kumpf edited comment on YARN-5366 at 4/3/17 5:25 PM:
-----------------------------------------------------------

Thanks [~vinodkv]! Responses below.

{quote}
Signal.QUIT handling is very application specific. For e.g, nginx does graceful 
shutdown while JVMs do thead dump and don't shut-down at all. We shouldn't stop 
/ rm container for QUIT at all?
{quote}

I addressed this in another design document, but here is the jist of it. While 
it is possible to do a {{docker kill --signal SIGQUIT}} this is limited in it 
usefulness and may result in unexpected behavior. The signal is always sent to 
PID 1 in the container. Depending on the image or app type, this may not be the 
process we want to catch that signal. Alternatively, users can specify the 
STOPSIGNAL in the Dockerfile and the user likely has a better understanding of 
the implications for that application/image type. Thoughts on how this should 
be handled? Should we just ignore the Signal.QUIT?

{quote}
I think the best we can do is to send the intent to container-executor binary 
and let it do stop and rm in one shot so as to save on multiple launches.
{quote}

IMO, moving more of this logic into c-e complicates matters and doesn't follow 
what we've done so far. Nearly all existing DockerCommands execute via c-e as a 
single Docker CLI command. If the concern is the performance hit, the Stop 
command here is a safeguard and should not get called as the container should 
be completed. However, you can't rm a container that isn't stopped, so ensuring 
it has been stopped is necessary. 

I've created and posted patches to YARN-6366 (Refactor the NodeManager 
DeletionService to support additional DeletionTask types) and YARN-6374 
(Improve test coverage and add utility classes for common Docker operations). 
These are the prerequisites to have docker containers honor the debug delay.


was (Author: shaneku...@gmail.com):
Thanks [~vinodkv]! Responses below.

{quote}
Signal.QUIT handling is very application specific. For e.g, nginx does graceful 
shutdown while JVMs do thead dump and don't shut-down at all. We shouldn't stop 
/ rm container for QUIT at all?
{quote}

I addressed this in another design document, but here is the jist of it. While 
it is possible to do a {{docker kill --signal SIGQUIT}} this is limited in it 
usefulness and may result in unexpected behavior. The signal is always sent to 
PID 1 in the container. Depending on the image or app type, this may not be the 
process we want to catch that signal. Alternatively, users can specify the 
STOPSIGNAL in the Dockerfile and the user likely has a better understanding of 
the implications for that application/image type. Thoughts on how this should 
be handled?

{quote}
I think the best we can do is to send the intent to container-executor binary 
and let it do stop and rm in one shot so as to save on multiple launches.
{quote}

IMO, moving more of this logic into c-e complicates matters and doesn't follow 
what we've done so far. Nearly all existing DockerCommands execute via c-e as a 
single Docker CLI command. If the concern is the performance hit, the Stop 
command here is a safeguard and should not get called as the container should 
be completed. However, you can't rm a container that isn't stopped, so ensuring 
it has been stopped is necessary. 

I've created and posted patches to YARN-6366 (Refactor the NodeManager 
DeletionService to support additional DeletionTask types) and YARN-6374 
(Improve test coverage and add utility classes for common Docker operations). 
These are the prerequisites to have docker containers honor the debug delay.

> Add support for toggling the removal of completed and failed docker containers
> ------------------------------------------------------------------------------
>
>                 Key: YARN-5366
>                 URL: https://issues.apache.org/jira/browse/YARN-5366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>              Labels: oct16-medium
>         Attachments: YARN-5366.001.patch, YARN-5366.002.patch, 
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, 
> YARN-5366.006.patch
>
>
> Currently, completed and failed docker containers are removed by 
> container-executor. Add a job level environment variable to 
> DockerLinuxContainerRuntime to allow the user to toggle whether they want the 
> container deleted or not and remove the logic from container-executor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5366) Add support for toggling the removal of completed and failed docker containers

Reply via email to