[jira] [Commented] (YARN-4759) Fix signal handling for docker containers

Eric Badger (JIRA) Fri, 01 Sep 2017 12:49:32 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151070#comment-16151070
 ]


Eric Badger commented on YARN-4759:
-----------------------------------

[~shaneku...@gmail.com], another question (hopefully you didn't answer this one 
already in another JIRA). Is it necessary for us to use {{docker 
stop}}/{{docker kill}} to send signals to the processes within the docker 
container, specifically during shutdown? If the docker containers have already 
exited, then the {{docker stop}} command will cause an exception because the 
command failed. In the non-docker signaling case, the container-executor will 
check for whether the process still exists before it sends the signal and will 
send a specific error code back that we can safely ignore (and then log in 
DEBUG) in the even that it doesn't exist. But since we exec the {{docker stop}} 
command, we will get the return code of whatever that command gives, since we 
lose control after the exec. In the case of a container that doesn't exist, 
{{docker stop}} returns 1. This exception is spamming the NM log for me. From 
what I understand, docker will always send the signal to PID 1 (until we assume 
Docker 1.13 support, which has the {{--init}} flag to start and reap all 
processes). But this is fine, because PID 1 for docker containers is {{bash 
-c}} and bash should forward that signal along to its child process, since they 
have the same process groups.

tl;dr Can we use the same signaling code in docker that we use in non-docker so 
that we can get rid of these benign exceptions in the NM log or is there a 
reason we need to use the {{docker stop}} and {{docker kill}} commands?

cc [~vvasudev]

> Fix signal handling for docker containers
> -----------------------------------------
>
>                 Key: YARN-4759
>                 URL: https://issues.apache.org/jira/browse/YARN-4759
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Shane Kumpf
>             Fix For: 2.9.0, 3.0.0-alpha1
>
>         Attachments: YARN-4759.001.patch, YARN-4759.002.patch, 
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4759) Fix signal handling for docker containers

Reply via email to