[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151070#comment-16151070 ]
Eric Badger commented on YARN-4759: ----------------------------------- [~shaneku...@gmail.com], another question (hopefully you didn't answer this one already in another JIRA). Is it necessary for us to use {{docker stop}}/{{docker kill}} to send signals to the processes within the docker container, specifically during shutdown? If the docker containers have already exited, then the {{docker stop}} command will cause an exception because the command failed. In the non-docker signaling case, the container-executor will check for whether the process still exists before it sends the signal and will send a specific error code back that we can safely ignore (and then log in DEBUG) in the even that it doesn't exist. But since we exec the {{docker stop}} command, we will get the return code of whatever that command gives, since we lose control after the exec. In the case of a container that doesn't exist, {{docker stop}} returns 1. This exception is spamming the NM log for me. From what I understand, docker will always send the signal to PID 1 (until we assume Docker 1.13 support, which has the {{--init}} flag to start and reap all processes). But this is fine, because PID 1 for docker containers is {{bash -c}} and bash should forward that signal along to its child process, since they have the same process groups. tl;dr Can we use the same signaling code in docker that we use in non-docker so that we can get rid of these benign exceptions in the NM log or is there a reason we need to use the {{docker stop}} and {{docker kill}} commands? cc [~vvasudev] > Fix signal handling for docker containers > ----------------------------------------- > > Key: YARN-4759 > URL: https://issues.apache.org/jira/browse/YARN-4759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn > Reporter: Sidharta Seethana > Assignee: Shane Kumpf > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-4759.001.patch, YARN-4759.002.patch, > YARN-4759.003.patch > > > The current signal handling (in the DockerContainerRuntime) needs to be > revisited for docker containers. For example, container reacquisition on NM > restart might not work, depending on which user the process in the container > runs as. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org