[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267966#comment-15267966 ] Paul Rogers commented on YARN-3066: --- This problem persists on Mac OS El Capitan. However, a workaround exists in the form of this simple Git project: https://github.com/jerrykuch/ersatz-setsid. * Install the Apple XCode command line tools. * Clone the project. * Build the project using make. cd ersatz-setsid make * Copy the resulting setsid program to /usr/bin: sudo cp setsid /usr/bin * Restart YARN. Now, your process will shut down correctly. YARN already has included C code. YARN could save us all a ton of grief (took me a day to track down this issue to find this bug) by including it's own version of setsid. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563011#comment-14563011 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563010#comment-14563010 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563012#comment-14563012 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563013#comment-14563013 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563008#comment-14563008 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563009#comment-14563009 ] Allen Wittenauer commented on YARN-3066: So yes, it's still sealed off without a contract. Meh. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562618#comment-14562618 ] Alan Burlison commented on YARN-3066: - Bugtraq is long gone, everything is now in the bug database accessibly via My Oracle Support (https://support.oracle.com) > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562114#comment-14562114 ] Allen Wittenauer commented on YARN-3066: Is Bugtraq+ (if that is what it is still called... haven't been a Sun employee for a while...) still sealed off? > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561234#comment-14561234 ] Dmitry Sivachenko commented on YARN-3066: - Solaris can use the same ssid program (it is just a simple wrapper for setsid() syscall). I just proposed a simplest fix for that problem. JNI wrapper sounds like better approach. What I want to see in any case is the loud error message in case setsid binary (or setsid() syscall if we go JNI way) is unavailable. Right now it pretends to work and I spent some time digging out whats going wrong and why I see a lot of orphans. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561130#comment-14561130 ] Alan Burlison commented on YARN-3066: - Yes, that's a good point about not every platform having libhadoop. Solaris for example has the syscall but not the executable, so in that case it's a better solution to use the syscall but that's not always going to be the case. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561118#comment-14561118 ] Allen Wittenauer commented on YARN-3066: bq. As Linux, OSX, Solaris and BSD all support the setsid(2) syscall and it's part of POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm), isn't a better solution just to wrap setsid() + exec() in a little bit of JNI? That would avoid the need to install external executables. That would break platforms that don't have a working libhadoop (which are plentiful). However, there could be a test here that says if libhadoop is available, use it. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560762#comment-14560762 ] Alan Burlison commented on YARN-3066: - As Linux, OSX, Solaris and BSD all support the setsid(2) syscall and it's part of POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm), isn't a better solution just to wrap setsid() + exec() in a little bit of JNI? That would avoid the need to install external executables. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279287#comment-14279287 ] Dmitry Sivachenko commented on YARN-3066: - Windows case is tested separately, see private static boolean isSetsidSupported() in hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shel l.java for instance: if (Shell.WINDOWS) { return false; } In any UNIX-like case I suppose it will leave orphaned processes, because if isSetsidSupported()==false it uses kill(pid) to kill task instead of kill(pgid) to kill the whole process group. ssid(1) in FreeBSD is the analog setsid(1) in Linux: userland wrapper for setsid() system call. Renaming does not sound as sane idea, because it is hard to convince all people to do rename of installed binaries by hand. I propose to treat it like system-dependent option and act accordingly. (I suppose other OS's like Solaris also lack setsid(1) utility so they could also benefit). For ssid source see http://tools.suckless.org/ssid/ As for backwards compatibility we can change that in 3.0, it is not fatal, failure to start without setsid will just remind users to install setsid() or ssid() and proceed futher, and be sure that there will be no side effects like orphaned tasks eating CPU. > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279202#comment-14279202 ] Chris Nauroth commented on YARN-3066: - I'm not familiar with {{ssid}} on FreeBSD. Does it have the same usage as Linux {{setsid}}? If so, then perhaps an appropriate workaround is to copy that binary to {{setsid}} and make sure it's available on the {{PATH}}. This might not require any YARN code changes. bq. I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found. This would likely have to be considered backwards-incompatible, because applications would fail to start on existing systems that don't have {{setsid}}. I suppose the new behavior could be hidden behind an opt-in configuration property. Also, we need to keep in mind that {{Shell.isSetsidAvailable}} is always {{false}} on Windows. (On Windows, we handle the issue of orphaned processes by using Windows API job objects instead of {{setsid}}.) > Hadoop leaves orphaned tasks running after job is killed > > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 >Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns > task program via it. See > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is > used to spawn user task. If that task spawns other external programs (this > is common case if a task program is a shell script) and user kills job via > mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process > via exec: this is the guarantee to have orphaned processes when job is > prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does > almost the same as Linux's setsid). It would be nice to detect which binary > is present during configure stage and put @SETSID@ macros into java file to > use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start > if it is not found: at least we will know about the problem at start rather > than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)