[
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158542#comment-16158542
]
Shane Kumpf commented on YARN-4759:
-----------------------------------
Thanks for the follow up [~ebadger]
{quote}
I thought that you had decided that we didn't need to worry about this in your
comment above?
{quote}
I'm actually saying the opposite. My initial thought was to allow the user to
tell YARN the stop/kill signal when submitting the job. However, after more
research I found STOPSIGNAL, which means YARN doesn't need to explicitly handle
this and the user can define the necessary signal via the Dockerfile. This
depends on using {{docker stop}} though.
{quote}
How does docker stop solve the issue here? If the container doesn't exist yet,
then docker stop will fail with "No such container" and stop trying. The
documentation isn't very informative, but it doesn't appear to wait the grace
period for the SIGKILL if it can't find the container in the first place.
{quote}
Sorry, I wasn't very clear before, I'm referring to a different situation. The
container can exist, but the process inside the container may not be fully
started and/or Docker has not yet written the PID to the data structure used by
{{docker inspect}}. We use {{docker run}}, which does a {{docker create}} and
{{docker start}} behind the scenes. If the image doesn't exist it is implicitly
pulled during that time as well. You will often find the Created and StartedAt
times in {{docker inspect}} differ wildly due to additional background
operations. I will concede that {{docker stop}} is less necessary here, as a
container still in Created state can be {{docker rm}}-ed (well, most of the
time that is, but that's another discussion). However, the docker client is
decoupled from YARN, so it's quite possible for races to occur and containers
to become leaked, so it may still be useful in case the container has
transitioned to running while we attempt to obtain the PID, etc.
> Fix signal handling for docker containers
> -----------------------------------------
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Sidharta Seethana
> Assignee: Shane Kumpf
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch,
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be
> revisited for docker containers. For example, container reacquisition on NM
> restart might not work, depending on which user the process in the container
> runs as.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]