[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484382#comment-16484382 ] Eric Badger commented on YARN-8206: --- Wonderful! Thanks [~jlowe] for the review/commit and [~shaneku...@gmail.com], [~Jim_Brennan], and [~eyang] for reviews! > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484093#comment-16484093 ] Hudson commented on YARN-8206: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14250 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14250/]) YARN-8206. Sending a kill does not immediately kill docker containers. (jlowe: rev 5f11288e41fca2e414dcbea130c7702e29d4d610) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483087#comment-16483087 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! +1 lgtm. I'll commit this tomorrow if there are no objections. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482809#comment-16482809 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924373/YARN-8206.011.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux bbcb780999a2 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f48fec8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20811/testReport/ | | Max. process+thread count | 435 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20811/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill docker
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482694#comment-16482694 ] Eric Badger commented on YARN-8206: --- Thanks for the review, [~jlowe]! Patch 011 addresses your comments. At this point {{isContainerRequestedAsPrivileged()}} is a really simple gadget function that might not be necessary. Makes it easier to change the implementation of the function, though. If you'd like me to just inline this in the caller code, I can do that. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482584#comment-16482584 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! Unfortunately the patch no longer applies after YARN-8141 and needs to be updated. allowPrivilegedContainerExecution will now log a warning every time a privileged container is not requested which it did not do before. I'm wondering if a warning log is really ever needed here. It's already logging when a privileged container is requested, so we can infer if that log does not appear that the env variable was not set properly. Nit: isContainerRequestedAsPrivileged can be simplified with Boolean.parseBoolean which handles null directly. Nit: Since handleContainerKill is being refactored it would be good to improve the debug logging to leverage the SLF4J API so it doesn't have to do a debug check, i.e.: leveraging the \{\} positional parameter syntax so the message does not have to be built even if the log message will be discarded. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479463#comment-16479463 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923948/YARN-8206.010.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0a4cb8096c0f 3.13.0-141-generic #190-Ubuntu SMP Fri Jan 19 12:52:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c0ec061 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20769/testReport/ | | Max. process+thread count | 331 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20769/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill dock
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479315#comment-16479315 ] Eric Badger commented on YARN-8206: --- Thanks for the review, [~jlowe]! Patch 010 addresses your comments. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476476#comment-16476476 ] Jason Lowe commented on YARN-8206: -- Thanks for the patch! Calling allowPrivilegedContainerExecution in this new context is a bit problematic given how that method behaves. It can throw, and it also logs an info message stating a container will be launched. Ideally the code should be refactored to make it a pure predicate function that can be reused in this new context and the logging/throwing can be moved to the launch path where it was originally needed. Nit: signalContainer always computes the containerId string when only the TERM signal case now needs it. It should be moved to that conditional. Nit: The unit test changes are much bigger than they need to be. There are a lot of unrelated formatting changes around ENV_DOCKER_CONTAINER_RUN_PRIVILEGED_CONTAINER, and we can avoid needing to update every caller of capturePrivilegedOperation with an overloaded method, e.g.: {code} private PrivilegedOperation capturePrivilegedOperation() throws PrivilegedOperationException { return capturePrivilegedOperation(1); } {code} > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469590#comment-16469590 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922718/YARN-8206.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9b7e71a59563 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / af4fc2e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20670/testReport/ | | Max. process+thread count | 465 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20670/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill docker
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469423#comment-16469423 ] Eric Badger commented on YARN-8206: --- Somehow lost the checkstyle changes during the rebase. Patch 009 addresses those. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469411#comment-16469411 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 13 new + 24 unchanged - 0 fixed = 37 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 26s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922679/YARN-8206.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 71c1ea606221 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / af4fc2e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20666/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20666/testReport/ | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: ha
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469331#comment-16469331 ] Jim Brennan commented on YARN-8206: --- New patch looks good to me. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469205#comment-16469205 ] Eric Badger commented on YARN-8206: --- bq. Eric Badger, patch 7 does not look right... facepalm. Sorry, pulled trunk and didn't rebase the branch that this patch was on. Patch 008 is rebased to trunk. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469201#comment-16469201 ] Jim Brennan commented on YARN-8206: --- [~ebadger], patch 7 does not look right... > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469191#comment-16469191 ] Eric Badger commented on YARN-8206: --- Thanks for the review, [~Jim_Brennan]! Uploaded new patch that addresses your comments. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469116#comment-16469116 ] Jim Brennan commented on YARN-8206: --- [~ebadger], thanks for the patch! In handleContainerKill(), you should move the construction of privOp to the !allowPrivilegedContainerExecution (else) case - we don't need it in the privileged execution case. Would be nice if we could include the signal.name in the "Signal container failed" message. nit: capturePrivilegedOperation - remove comment about single invocation expected. nit: can use conf.setBoolean() for setting NM_DOCKER_ALLOW_PRIVILEGED_CONTAINERS > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469028#comment-16469028 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922642/YARN-8206.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1082d5c2d7dc 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 343b51d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20661/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20661/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill docke
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468909#comment-16468909 ] Eric Badger commented on YARN-8206: --- Fixing checkstyle > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468117#comment-16468117 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 12 new + 24 unchanged - 0 fixed = 36 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 11s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922543/YARN-8206.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9e77a2c67f12 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 69aac69 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20651/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20651/testReport/ | | Max. process+thread count | 327 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: h
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468047#comment-16468047 ] Eric Badger commented on YARN-8206: --- Attaching a new patch that uses the docker api for privileged containers. In doing this, I noticed that the tests in TestDockerContainerRuntime were not actually testing the actual docker code. Rather, they were testing their own hard-coded mocks. So, I went and fixed that as well, since I had to fix those tests anyway given this new code. However, since I touched some tests that aren't related to my code, I would be appreciative if people could take an extra careful look at how I fixed up the tests and made them work to see if I did that correctly. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467626#comment-16467626 ] Eric Badger commented on YARN-8206: --- Thanks for the quick input! I'll put up a patch soon based on proposal 2. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467574#comment-16467574 ] Jim Brennan commented on YARN-8206: --- +1 for proposal 2. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467406#comment-16467406 ] Jason Lowe commented on YARN-8206: -- +1 for proposal 2 from me as well. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467265#comment-16467265 ] Shane Kumpf commented on YARN-8206: --- [~ebadger] - I'm also +1 for proposal 2. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466616#comment-16466616 ] Eric Yang commented on YARN-8206: - [~ebadger] +1 for proposal 2. This is safer option in my opinion. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466529#comment-16466529 ] Eric Badger commented on YARN-8206: --- [~eyang], [~shaneku...@gmail.com], [~jlowe], [~Jim_Brennan], I've come up with 2 different ways to solve the privileged container issue and I'd like your input on which route to go (though I have a slight preference). In both proposals, non-privileged containers will be signaled using {{kill}} instead of {{docker kill}} Proposal 1: The container-executor will not give up being root when it goes to signal the container, and will thus signal the container as root. This would only be for privileged containers, but is something that is not currently possible (right now, signaling has to call {{set_user()}} and you cannot set "root" as the user. Proposal 2: Use the docker API for privileged containers, just like the code does today. This way, we won't be killing arbitrary process as root, just wielding the docker daemon as root as we do today. The downside here is that we have to go through a docker API call, which is slower than just sending the signal straight to the process. My preference would be for Proposal 2 as I'm not super comfortable allowing the container-executor to kill arbitrary processes as root, if you were somehow able to compromise the NM user. Currently, you would only be able to kill arbitrary non-root processes, if you comprised the NM user. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461771#comment-16461771 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 42m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921647/YARN-8206.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ae4c1cd835bb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6b63a0a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20576/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20576/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill docke
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461751#comment-16461751 ] Eric Yang commented on YARN-8206: - >From today's meeting, signal handling is ran as the user who submit the job. >This is likely to break with YARN-7221 where privileged docker container can >run as a different user than the user who submit the job. Some work is >required to handle this corner case. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461678#comment-16461678 ] Eric Badger commented on YARN-8206: --- Patch 004 removes unused import > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461627#comment-16461627 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 24 unchanged - 0 fixed = 25 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 37s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921626/YARN-8206.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ffe009c04019 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6b63a0a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20574/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20574/testReport/ | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoo
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461567#comment-16461567 ] Shane Kumpf commented on YARN-8206: --- Thanks [~ebadger]! {quote}I'm looking through the code and I thought that error message had always been a problem.\{quote} You're right, the warning isn't something introduced by this patch. I'm fine with addressing that in a separate JIRA. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461556#comment-16461556 ] Eric Badger commented on YARN-8206: --- Hey [~shaneku...@gmail.com], thanks for the review. I fixed up the first 3 comments that you mentioned. I'm wondering, however, for 4, is that a regression? Should that be handled in a separate JIRA? I'm looking through the code and I thought that error message had always been a problem. I'm not super opposed to putting it in this JIRA, just want to make sure I'm on the same page > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461463#comment-16461463 ] Shane Kumpf commented on YARN-8206: --- Thanks for the patch, [~ebadger]! I've been able to validate that it is working as intended. Few items to address. # {{TestDockerContainerRuntime$MockRuntime#signalContainer}} needs updated to incorporate the change in logic. {{TestDockerContainerRuntime#testDockerStopOnKillSignalWhenRunning}} will need to change as well. # Nit: {{privilegedOperationExecutor}} could be used in {{DockerLinuxContainerRuntime#handleContainerKill}} # Based on the comment in the catch block for {{DockerLinuxContainerRuntime#handleContainerKill}}, I think the {{privOp}} should have failure logging disabled via {{PrivilegedOperation#disableFailureLogging}}. # Along those same lines, the current code will result in a warning in the NM log when a container completes prior to the kill. I think {{signalContainer}} will need similar logic to {{LinuxContainerExecutor#handleExitCode}} to suppress printing the warning in this case. c-e is returning INVALID_CONTAINER_PID (rc=9). Alternatively, it may make sense to remove this warning all together. Here is the warning: {code:java} 2018-05-02 17:42:35,363 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1525282503184_0001_01_02 transitioned from RUNNING to EXITED_WITH_SUCCESS 2018-05-02 17:42:35,363 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1525282503184_0001_01_02 2018-05-02 17:42:35,700 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime: Signal docker container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.handleContainerKill(DockerLinuxContainerRuntime.java:1223) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:995) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor$DelayedProcessKiller.run(ContainerExecutor.java:854) {code} > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch, YARN-8206.002.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460299#comment-16460299 ] genericqa commented on YARN-8206: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921466/YARN-8206.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d87b1a449b55 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8f42daf | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20558/testReport/ | | Max. process+thread count | 408 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20558/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460185#comment-16460185 ] Eric Badger commented on YARN-8206: --- Not quite sure how I missed the lack of {{executePrivilegedOperation}}. I swear I tested that. Anywho, uploaded a new patch that fixes that as well as the checkstyle issues. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch, YARN-8206.002.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459701#comment-16459701 ] Jason Lowe commented on YARN-8206: -- {quote}Can we send signal to wrong pid, if pid namespace feature is turned on? {quote} No. PID namespace is the default mode for docker, and it works just fine there. The PID namespace is from the perspective of the container not from the nodemanager. Outside of the container we can still see the container's root pid and all the other pids in the container. YARN's current container monitoring requires that ability to monitor the resources properly. I completely agree it should be using cgroup monitoring for Docker (or really for any situation where it's using cgroups), but that's how it works today. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459446#comment-16459446 ] Eric Yang commented on YARN-8206: - [~Jim_Brennan] Docker supports pid namespace. Can we send signal to wrong pid, if pid namespace feature is turned on? In general, it is good practice to keep the abstraction in place to avoid direct call to internal state of another program. I see the reasoning to have a short cut to pid to save the system in the emergency situation, but we need to be cautions that the keep it simple optimization doesn't create limitations by accident. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-8206.001.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459209#comment-16459209 ] genericqa commented on YARN-8206: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 18s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 6 unchanged - 0 fixed = 9 total (was 6) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 59s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 22s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Useless object stored in variable privOp of method org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.handleContainerKill(ContainerRuntimeContext) At DockerLinuxContainerRuntime.java:privOp of method org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.handleContainerKill(ContainerRuntimeContext) At DockerLinuxContainerRuntime.java:[line 1199] | | Failed junit tests | hadoop.yarn.server.nodemanager.TestNodeManagerReboot | | | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | | | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Ima
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458716#comment-16458716 ] Shane Kumpf commented on YARN-8206: --- Thanks for the additional background, [~Jim_Brennan]. While I do still believe we should use the Docker API/client where possible, it seems Docker may get in the way In this case. I can see the advantages of signalling directly and am +1 on trying out that approach. Given this decision is an internal implementation detail, if we do find that {{docker kill}} is necessary, we could always revisit. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458694#comment-16458694 ] Jim Brennan commented on YARN-8206: --- [~eyang], [~jlowe], I'm not sure I see the advantage of using the docker API for kill? We have the PID, so just killing it directly seems like the most efficient method. [~ebadger] filed this issue after we ran into a case where nodes were running out of memory and started to kill containers to free up space, but it was taking too long so things just got worse and the linux OOM kicked in and started killing processes as well. So in these cases where yarn has decided to kill containers, I think we should make that path as efficient as possible so we can recover more quickly. [~eyang], can you clarify what advantage is gained by using docker kill in this case? > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454599#comment-16454599 ] Eric Yang commented on YARN-8206: - [~jlowe] I agree with [~shaneku...@gmail.com] to use docker API as much as possible. We reduce reliance on root user to do all the heavy lifting. Some operations may require permission check, and docker api provides the safety net to catch errors. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454179#comment-16454179 ] Jason Lowe commented on YARN-8206: -- bq. I expect the Docker maintainers would steer us towards the Docker CLI/API. Agreed, but clearly Docker has to handle the container's root process exiting on its own, whether that's via a signal or otherwise. The only thing I can think of that would matter here is possibly some state reporting on how the container exited from the docker perspective. If we use docker kill then it could know it was the origin of the teardown vs. just seeing a sig 9 termination from the root process. My main concerns with using docker kill is the extra fork-exec overhead, the container lock/unlocking, health check suspensions, etc. That is all stuff we do not want if all we're doing is sending a SIGQUIT to get it to dump the JVM thread stacks. And if we're trying to kill it ASAP with SIGKILL, I personally don't want to wait around for all that extra stuff to happen. It's more things to potentially break. Even the docker kill logic ends up killing the process directly if the other stuff doesn't work (after 2 seconds of waiting around). As a bonus, we would stop getting the noisy errors from docker if the process has already exited by the time we try to kill it. So I think it comes down to bookkeeping and reporting from docker. If we're not concerned so much about how Docker reports the final status of the container and more interested in getting the container signalled ASAP then I think we should just signal it directly. If the final container state from docker is important or images can tailor how signals are handled by a container (i.e.: like the container stop behavior) then we should go through docker to do the signalling. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453922#comment-16453922 ] Shane Kumpf commented on YARN-8206: --- {quote}Is it necessary to do a docker kill? We already have the pid for the root process of the container, and it's a lot less work to just signal that directly.{quote} I agree handling the signal directly does save overhead. Signalling directly avoids potential differences between Docker versions, which could be considered a pro. I did a few tests and didn't find any ill effects. My only hesitation is that [docker kill|https://github.com/moby/moby/blob/3a633a712c8bbb863fe7e57ec132dd87a9c4eff7/daemon/kill.go] does do a bit more than just call the OS kill. If we handle this outside of the docker client, do we run the risk of impacting docker's internal accounting? The answer is likely no, but I expect the Docker maintainers would steer us towards the Docker CLI/API. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453126#comment-16453126 ] Jason Lowe commented on YARN-8206: -- Is it necessary to do a docker kill? We already have the pid for the root process of the container, and it's a lot less work to just signal that directly. I agree calling docker stop makes sense for SIGTERM, but docker kill seems like overkill (no pun intended) for any other signals. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453020#comment-16453020 ] Shane Kumpf commented on YARN-8206: --- Thanks for raising this [~ebadger]. The reasoning behind this was to honor STOPSIGNAL, but after more thought, I agree with your assessment and proposal. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452976#comment-16452976 ] Eric Badger commented on YARN-8206: --- My proposal is that we handle {{KILL}} and {{TERM}} separately. For {{TERM}}, we can still do a {{docker stop}} as we do today. However, for a {{KILL}}, we would just pass it along to the {{handleContainerKill()}} function and have it do a {{docker kill}}. [~shaneku...@gmail.com], [~jlowe], [~Jim_Brennan], thoughts? > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org