[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578065#comment-14578065 ] Xuan Gong commented on YARN-76: --- I think this is duplication with YARN-3561. We already have discussion there. Close this as duplication killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang Assignee: Xuan Gong When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535046#comment-14535046 ] Rohith commented on YARN-76: [~xgong] Any update on this issue? Do you think issue is still exists in branch-2 or trunk? killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang Assignee: Xuan Gong When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725719#comment-13725719 ] Xuan Gong commented on YARN-76: --- [~bowang] I run the sleep task, and kill this application. Could you tell me which command you are running to find out that the application master is not killed. Because I run ps -A before and after kill command, and I do not find the application master is alive. killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710758#comment-13710758 ] Vinod Kumar Vavilapalli commented on YARN-76: - On Linux, we launch commands via a setsid binary. Which is absent on Mac. We could try writing our own very simple setsid program or ask Mac users to install such a binary from elsewhere. killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710762#comment-13710762 ] Chris Nauroth commented on YARN-76: --- Jira mark-up totally butchered the last comment. :-) That was meant to say: With setsid enabled, we send kill commands with a hypen prepended to the pid to indicate a process group, and thus kill the whole process group. Without setsid enabled, we don't prepend the hypen... killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710761#comment-13710761 ] Chris Nauroth commented on YARN-76: --- The fact that this only repros on Mac makes me suspect that we're seeing a problem related to setsid/process groups. setsid is available on Linux, but it's not available on Mac. https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L304 With setsid enabled, we send kill commands with a '-' prepended to the pid to indicate a process group, and thus kill the whole process group. Without setsid enabled, we don't prepend the '-' (no process group), and we only kill the single process. Perhaps Xuan's suggestion to use pkill -P on platforms without setsid would work, though I have't researched if pkill is something widely available or just present on Mac or just present on certain BSD flavors. I suspect that we don't have this problem on Windows. On Windows, we don't have setsid, but we do have the concept of process groups using a different underlying implementation (Windows job objects). killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710816#comment-13710816 ] Xuan Gong commented on YARN-76: --- {code} public static String[] getSignalKillCommand(int code, String pid) { return Shell.WINDOWS ? new String[] { Shell.WINUTILS, task, kill, pid } : new String[] { kill, - + code, isSetsidAvailable ? - + pid : pid }; } {code} If setSid is supported, the command kill -15/-9 -{$pid} will be executed. This will kill the whole process group. If setSid is not supported, the command kill -15/-9 $pid will be executed. killApplication doesn't fully kill application master on Mac OS --- Key: YARN-76 URL: https://issues.apache.org/jira/browse/YARN-76 Project: Hadoop YARN Issue Type: Bug Environment: Failed on MacOS. OK on Linux Reporter: Bo Wang When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o any interruption). I figured out part of the reason after some debugging. NM starts a AM with command like /bin/bash -c /path/to/java SampleAM. This command is executed in a process (say with PID 0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process (PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only the bash process is killed. The Java process is in the wild since then. Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira