[ 
https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710761#comment-13710761
 ] 

Chris Nauroth commented on YARN-76:
-----------------------------------

The fact that this only repros on Mac makes me suspect that we're seeing a 
problem related to setsid/process groups.  setsid is available on Linux, but 
it's not available on Mac.

https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L304

With setsid enabled, we send kill commands with a '-' prepended to the pid to 
indicate a process group, and thus kill the whole process group.  Without 
setsid enabled, we don't prepend the '-' (no process group), and we only kill 
the single process.  Perhaps Xuan's suggestion to use pkill -P on platforms 
without setsid would work, though I have't researched if pkill is something 
widely available or just present on Mac or just present on certain BSD flavors.

I suspect that we don't have this problem on Windows.  On Windows, we don't 
have setsid, but we do have the concept of process groups using a different 
underlying implementation (Windows job objects).
                
> killApplication doesn't fully kill application master on Mac OS
> ---------------------------------------------------------------
>
>                 Key: YARN-76
>                 URL: https://issues.apache.org/jira/browse/YARN-76
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: Failed on MacOS. OK on Linux
>            Reporter: Bo Wang
>
> When client sends a ClientRMProtocol#killApplication to RM, the corresponding 
> AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o 
> any interruption).
> I figured out part of the reason after some debugging. NM starts a AM with 
> command like "/bin/bash -c /path/to/java SampleAM". This command is executed 
> in a process (say with PID 0001), which starts another Java process (say with 
> PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash 
> process (PID 0001). In Linux, the death of the bash process (PID 0001) will 
> trigger the kill of the Java process (PID 0002). However, in Mac OS, only the 
> bash process is killed. The Java process is in the wild since then.
> Note: on Mac OS, DefaultContainerExecutor is used rather than 
> LinuxContainerExecutor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to