[
https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710761#comment-13710761
]
Chris Nauroth commented on YARN-76:
-----------------------------------
The fact that this only repros on Mac makes me suspect that we're seeing a
problem related to setsid/process groups. setsid is available on Linux, but
it's not available on Mac.
https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L304
With setsid enabled, we send kill commands with a '-' prepended to the pid to
indicate a process group, and thus kill the whole process group. Without
setsid enabled, we don't prepend the '-' (no process group), and we only kill
the single process. Perhaps Xuan's suggestion to use pkill -P on platforms
without setsid would work, though I have't researched if pkill is something
widely available or just present on Mac or just present on certain BSD flavors.
I suspect that we don't have this problem on Windows. On Windows, we don't
have setsid, but we do have the concept of process groups using a different
underlying implementation (Windows job objects).
> killApplication doesn't fully kill application master on Mac OS
> ---------------------------------------------------------------
>
> Key: YARN-76
> URL: https://issues.apache.org/jira/browse/YARN-76
> Project: Hadoop YARN
> Issue Type: Bug
> Environment: Failed on MacOS. OK on Linux
> Reporter: Bo Wang
>
> When client sends a ClientRMProtocol#killApplication to RM, the corresponding
> AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o
> any interruption).
> I figured out part of the reason after some debugging. NM starts a AM with
> command like "/bin/bash -c /path/to/java SampleAM". This command is executed
> in a process (say with PID 0001), which starts another Java process (say with
> PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash
> process (PID 0001). In Linux, the death of the bash process (PID 0001) will
> trigger the kill of the Java process (PID 0002). However, in Mac OS, only the
> bash process is killed. The Java process is in the wild since then.
> Note: on Mac OS, DefaultContainerExecutor is used rather than
> LinuxContainerExecutor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira