[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818377#comment-16818377 ] Hudson commented on YARN-7848: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16412 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16412/]) YARN-7848 Force removal of docker containers that do not get removed on (ebadger: rev 5583e1b6fcfd5651857ada7ed851f09fc19969bc) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch, YARN-7848.004.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815709#comment-16815709 ] Eric Yang commented on YARN-7848: - [~Jim_Brennan] Thank you for tips. [~ebadger] can you commit the patch, if it looks good to you? Thanks > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch, YARN-7848.004.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815434#comment-16815434 ] Jim Brennan commented on YARN-7848: --- Thanks for updating [~eyang]! lgtm. I am +1 on patch 004 (non-binding). > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch, YARN-7848.004.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814928#comment-16814928 ] Hadoop QA commented on YARN-7848: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-7848 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965521/YARN-7848.004.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux f2c5d1dd9dc6 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8740755 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23933/testReport/ | | Max. process+thread count | 412 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23933/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch, YARN-7848.004.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a >
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814900#comment-16814900 ] Eric Yang commented on YARN-7848: - [~Jim_Brennan] Thanks for the feedback. Patch 004 clean up the code according to your comments. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch, YARN-7848.004.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814627#comment-16814627 ] Jim Brennan commented on YARN-7848: --- Thanks [~eyang] for the update! Some comments: In get_docker_rm_command(), we are not setting the return code if the add_to_args() for {{"-f"}} fails., and it seems like it would be cleaner to structure it as: {noformat} ret = add_to_args(args, DOCKER_RM_COMMAND); if (ret != 0) { ret = BUFFER_TOO_SMALL; goto free_and_exit; } ret = add_to_args(args, "-f"); if (ret != 0) { ret = BUFFER_TOO_SMALL; goto free_and_exit; } ret = add_to_args(args, container_name); if (ret != 0) { ret = BUFFER_TOO_SMALL; goto free_and_exit; } {noformat} (nit) In remove_docker_container(), if you wanted to minimize the changes, you could have kept {{start_index}} and just set {{args[1] = argv[start_index];}} after the if. In remove_docker_container(), I don't think it is appropriate to use free_values(args). The values are on the stack, not the heap. You do want to do a free(args) to free the array of pointers you allocated. I think you need to do this free in both the child and the parent. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813627#comment-16813627 ] Eric Yang commented on YARN-7848: - [~ebadger] [~Jim_Brennan] The failed unit test is not related to patch 003. Please review. thanks > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812972#comment-16812972 ] Hadoop QA commented on YARN-7848: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-7848 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965253/YARN-7848.003.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux c22e58a01bfe 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 69e3745 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23917/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23917/testReport/ | | Max. process+thread count | 447 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23917/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, >
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812834#comment-16812834 ] Eric Yang commented on YARN-7848: - [~ebadger] [~Jim_Brennan] Thanks for the extra review to make sure that we didn't code this into a no-op. Patch 003 adds "-f" to both remove_docker_container and get_docker_rm_command. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812755#comment-16812755 ] Eric Badger commented on YARN-7848: --- Looked at the code with [~Jim_Brennan] and he's right. {{DockerRmCommand}} calls the container-executor with {{--remove-docker-container}}. That codepath goes through {{remove_docker_container}} in container-executor.c. To be consistent with the use of {{-f}} for docker rm, I believe we should change it in both places in the C code (i.e. in {{remove_docker_container}} and {{get_docker_rm_command}}). > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812744#comment-16812744 ] Jim Brennan commented on YARN-7848: --- [~eyang], [~ebadger], it looks to me like the code in patch 002 will not do what we want. We are adding the "force", "true" arguments to the DockerRmCommand() and handling them in the --run-docker c-code. But DockerRmCommand() is actually passing the --remove_docker_container command, which is not modified. "force", "true" will be passed in to the docker rm command, and it won't know what to do with them. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812730#comment-16812730 ] Eric Badger commented on YARN-7848: --- bq. Changing this can have dicey outcome and not backward compatible. I am likely to have to use cmd file serialization structure to pass the force option from Java to container-executor for safety reasons. Yea that makes sense. I thought we were already using the .cmd file with remove-docker-container. Ok, given that reasoning I'm alright keeping the Java code as it is and making the code changes all in the container-executor. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812722#comment-16812722 ] Eric Yang commented on YARN-7848: - {quote}Thanks for the update, Eric Yang! On the java side, it looks like DockerRmCommand is invoked in 2 different places. Could we maintain the original constructor with no force and then add an additional constructor with a force parameter? We could then change the 2 invocations to use the new constructor with setting force to true.{quote} I can preserve existing constructor without force. Passing optional parameter to container-executor for removing container is slightly dicey at this point. The accepting parameters are: ```container-executor --remove-docker-container [hierarchy] ``` What we want is: ```container-executor --remove-docker-container [hierarchy] [force] ``` Changing this can have dicey outcome and not backward compatible. I am likely to have to use cmd file serialization structure to pass the force option from Java to container-executor for safety reasons. ```container-executor --run-docker ``` This will leave a lot of debris and dead code that others may pick up and use it incorrectly. I am not sure passing this flag from Java is good for long term maintenance. Keeping it simple is probably better. Thoughts [~ebadger] [~Jim_Brennan] [~shaneku...@gmail.com] [~billie.rinaldi]? > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812528#comment-16812528 ] Eric Badger commented on YARN-7848: --- Thanks for the update, [~eyang]! On the java side, it looks like {{DockerRmCommand}} is invoked in 2 different places. Could we maintain the original constructor with no force and then add an additional constructor with a {{force}} parameter? We could then change the 2 invocations to use the new constructor with setting {{force}} to {{true}}. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811375#comment-16811375 ] Hadoop QA commented on YARN-7848: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 56s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 34s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.TestDockerRmCommand | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-7848 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965038/YARN-7848.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 317b10b0565e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e9b859f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/23902/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | unit |
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811188#comment-16811188 ] Eric Yang commented on YARN-7848: - Patch 002 added the force option to cmd file, and invoke from Java side. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810951#comment-16810951 ] Eric Badger commented on YARN-7848: --- We talked about the use of {{-f}} in our most recent docker meeting and all came to agreement that this was appropriate for all removals. [~eyang]'s explanation above does a good job summarizing why. I do have one ask for the patch though. The current patch changes {{get_docker_rm_command}} to always give it the {{-f}} flag. I think it would be beneficial to have this as an option passed into the function so that we don't accidentally reuse this function in the future and unknowingly use {{rm -f}} when we aren't expecting to > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802833#comment-16802833 ] Jim Brennan commented on YARN-7848: --- I think I agree with [~eyang] on the use of {{-f}}. By the time we are trying to remove the container, we have already tried to kill the process and stop the container, so I don't think there is any danger in using the -f option, and it may succeed in cases where it otherwise doesn't now. I can't think of anything bad that would happen by using the force option every time in our use cases. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802178#comment-16802178 ] Eric Yang commented on YARN-7848: - [~ebadger] Docker rm will use stop signal then remove the container workspace and metadata. Docker rm -f will use SIGKILL then remove the container workspace and metadata. In YARN ContainerCleanup, it will send stop signal if pid exists. If not, it will run docker kill, then follow by docker rm. Therefore, the steps leading to docker rm is identical to docker stop, docker rm -f. There will be no process running after docker kill is ran. The remaining task is to remove metadata from docker database. Therefore, making distinction between first docker rm and docker rm -f become unnecessary because YARN already performed stop, and docker kill operations. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801943#comment-16801943 ] Eric Badger commented on YARN-7848: --- Thanks for the patch, [~eyang]! Correct me if I'm wrong, but the patch looks like it does a force removal on every removal. The idea of this JIRA is to only do the force removal if the first removal fails. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801892#comment-16801892 ] Eric Yang commented on YARN-7848: - The failed unit test is not related to this patch. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801270#comment-16801270 ] Hadoop QA commented on YARN-7848: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 31m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 49s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-7848 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963667/YARN-7848.001.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux c48ac68a88d7 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3f6d6d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23807/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23807/testReport/ | | Max. process+thread count | 311 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23807/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801229#comment-16801229 ] Eric Yang commented on YARN-7848: - Back track to original discussion that if container stuck in Created state, using -f flag is good enough to remove faulty container. The background thread may keep system cleaner, but not everyone is comfortable with that idea at this time. Therefore, focus on improvement to the current needs. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798427#comment-16798427 ] Eric Yang commented on YARN-7848: - [~ebadger] {quote}Additionally, how would this work with debug-delay? If I want my image to stick around for awhile (or indefinitely) so that I can debug them, how will that co-exist with this periodic pruning?{quote} I don't have good answers for system admin allowing image dump to stick around forever. The clean up thread is optional and configurable. The scheduling can be based on debug-delay to ensure image is being kept for debug delay window. This will only delete containers stuck in Created/Exited states after passing debug-delay window. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798387#comment-16798387 ] Eric Badger commented on YARN-7848: --- I don't like this idea. This is the Nodemanager completely taking over docker. The NM should use docker, but I don't think it should be assumed that it is the only thing that can use docker. To me, this seems like something that should be handled at an ops level in a cron job, if they want to make sure all images are gone. I'm ok with having a more generalized case of pruning images periodically, but I'm not comfortable with the NM pruning containers that it didn't start. Additionally, how would this work with debug-delay? If I want my image to stick around for awhile (or indefinitely) so that I can debug them, how will that co-exist with this periodic pruning? cc [~shaneku...@gmail.com] > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797677#comment-16797677 ] Eric Yang commented on YARN-7848: - There are many scenarios that containers may not create properly or shutdown properly due to problems with the images themselves. Application-lifecycles doesn't cover all abnormal cases. It looks like we need a background thread to run: {code} docker container prune --filter="until=15m" -a {code} Every 10 minutes. This will ensure the dangling containers have max life time no older than 15 minutes. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751820#comment-16751820 ] Eric Yang commented on YARN-7848: - [~ebadger] YARN-9074 does not force removal of container. Hence, this is still valid issue to be addressed. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750499#comment-16750499 ] Eric Badger commented on YARN-7848: --- Does YARN-9074 force the removal of the containers? The problem initially was that docker would sometimes get into a weird state where containers couldn't be deleted unless the {{-f}} flag was given to {{docker rm}}. So the original intention of this JIRA was to allow for a few retries without forcing removal and then doing the force if necessary > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750385#comment-16750385 ] Eric Yang commented on YARN-7848: - Do we still need this JIRA if YARN-9074 moved the docker container deletion into ContainerCleanup? > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org