[ 
https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710753#comment-13710753
 ] 

Xuan Gong commented on YARN-76:
-------------------------------

Do the simple test on Mac environment. Create a java class that simply do the 
infinite loop, and create shell script to execute this class. Run the command 
to kill the shell script process, but this java process keeps running. Looks 
like this is the reason.

Possible solution can be instead of execute kill -15/-9 ${pid}, we can do 
1. pkill -9/-15 -P ${pid}. This command will kill the process and all of its 
children processes, but does not kill the grandchild.
2. kill -9/-15 -${pgid}. This command will kill all processes, including the 
grandchild, which has same process group id. 
And we can use "ps -o pid 4848|sed 1d" to get pgid.
                
> killApplication doesn't fully kill application master
> -----------------------------------------------------
>
>                 Key: YARN-76
>                 URL: https://issues.apache.org/jira/browse/YARN-76
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: Failed on MacOS. OK on Linux
>            Reporter: Bo Wang
>
> When client sends a ClientRMProtocol#killApplication to RM, the corresponding 
> AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o 
> any interruption).
> I figured out part of the reason after some debugging. NM starts a AM with 
> command like "/bin/bash -c /path/to/java SampleAM". This command is executed 
> in a process (say with PID 0001), which starts another Java process (say with 
> PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash 
> process (PID 0001). In Linux, the death of the bash process (PID 0001) will 
> trigger the kill of the Java process (PID 0002). However, in Mac OS, only the 
> bash process is killed. The Java process is in the wild since then.
> Note: on Mac OS, DefaultContainerExecutor is used rather than 
> LinuxContainerExecutor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to