[
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151070#comment-16151070
]
Eric Badger commented on YARN-4759:
-----------------------------------
[[email protected]], another question (hopefully you didn't answer this one
already in another JIRA). Is it necessary for us to use {{docker
stop}}/{{docker kill}} to send signals to the processes within the docker
container, specifically during shutdown? If the docker containers have already
exited, then the {{docker stop}} command will cause an exception because the
command failed. In the non-docker signaling case, the container-executor will
check for whether the process still exists before it sends the signal and will
send a specific error code back that we can safely ignore (and then log in
DEBUG) in the even that it doesn't exist. But since we exec the {{docker stop}}
command, we will get the return code of whatever that command gives, since we
lose control after the exec. In the case of a container that doesn't exist,
{{docker stop}} returns 1. This exception is spamming the NM log for me. From
what I understand, docker will always send the signal to PID 1 (until we assume
Docker 1.13 support, which has the {{--init}} flag to start and reap all
processes). But this is fine, because PID 1 for docker containers is {{bash
-c}} and bash should forward that signal along to its child process, since they
have the same process groups.
tl;dr Can we use the same signaling code in docker that we use in non-docker so
that we can get rid of these benign exceptions in the NM log or is there a
reason we need to use the {{docker stop}} and {{docker kill}} commands?
cc [~vvasudev]
> Fix signal handling for docker containers
> -----------------------------------------
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Sidharta Seethana
> Assignee: Shane Kumpf
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch,
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be
> revisited for docker containers. For example, container reacquisition on NM
> restart might not work, depending on which user the process in the container
> runs as.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]