[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822151#comment-16822151 ] Eric Yang commented on YARN-8587: - [~BilwaST] 3.1.1 is a already released version. Sorry, I can not make changes to 3.1.1. I merged this patch to branch-3.1 and branch-3.2. The fix will be available in 3.1.2, and 3.2.1. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821835#comment-16821835 ] Bilwa S T commented on YARN-8587: - Hi [~eyang] can we cherry-pick this Jira to 3.1.1 version? > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662903#comment-16662903 ] Hudson commented on YARN-8587: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #15314 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15314/]) YARN-8587. Added retries for fetching docker exit code.(eyang: rev c16c49b8c3b8e2e42c00e79a50e7ae029ebe98e2) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662838#comment-16662838 ] Eric Yang commented on YARN-8587: - Thank you [~Charo Zhang] for the patch. +1 to commit this to trunk. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662557#comment-16662557 ] Hadoop QA commented on YARN-8587: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 31m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 21s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8587 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944929/YARN-8587.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux 50e3d9e024ec 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbc6dcd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22328/testReport/ | | Max. process+thread count | 402 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22328/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300"
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662454#comment-16662454 ] Eric Yang commented on YARN-8587: - This patch retries dock inspect exit code fetch when child process pid terminates. It looks like Docker needs a little time between container completed, and exit code getting recorded. This patch improves reliability of reading exit code from docker. I think the unit test failure was caused by YARN-8922 and not related to this patch. I triggered the pre-commit build again for sanity test. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658520#comment-16658520 ] Hadoop QA commented on YARN-8587: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 31m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8587 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944929/YARN-8587.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux aad77b916935 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a043dfa | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22271/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22271/testReport/ | | Max. process+thread count | 435 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22271/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658474#comment-16658474 ] Charo Zhang commented on YARN-8587: --- [~eyang] unit test has a timeout failed, I suspect that 10 times is too long. If it's not caused by this patch, i will recover to get_max_retries() and modify indentation. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658296#comment-16658296 ] Eric Yang commented on YARN-8587: - [~Charo Zhang] Thank you for the patch. Max_retries is hard coded to 3, instead of get_max_retries(); Any reason for the retries to be 3? Indentation and spacing are not properly aligned. Hadoop uses 2 spaces for indentation. {code} if (pclose (inspect_exitcode_docker) != 0 || res <= 0) { } else { } {code} The rest of the patch looks good to me. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658172#comment-16658172 ] Hadoop QA commented on YARN-8587: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 15s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8587 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944889/YARN-8587.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux aed54c76b66c 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f069d38 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22268/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22268/testReport/ | | Max. process+thread count | 338 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22268/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8587.patch > > > Launch dshell
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657126#comment-16657126 ] Hadoop QA commented on YARN-8587: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 3s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8587 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944744/YARN-8587.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux 24e1c1125757 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b22651e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22255/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22255/testReport/ | | Max. process+thread count | 314 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22255/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Charo Zhang >Priority:
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656632#comment-16656632 ] Charo Zhang commented on YARN-8587: --- [~eyang] of course,but i didn't find the button to upload the patch when status changed to "open". > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656112#comment-16656112 ] Charo Zhang commented on YARN-8587: --- [~aceric] please modify the status,i will upload patch today . Yesterday i forgot upload attachment patch. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655508#comment-16655508 ] Eric Yang commented on YARN-8587: - [~Charo Zhang] Would you like to contribute a patch for this issue? > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654760#comment-16654760 ] Charo Zhang commented on YARN-8587: --- add patch > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648769#comment-16648769 ] Charo Zhang commented on YARN-8587: --- [~eyang] We found the problem, too, it should be a bug. We fix it by disable detach if not "useEntryPoint". non-entry-point mode is very common. !image-2018-10-13-14-12-14-339.png! > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567431#comment-16567431 ] Eric Yang commented on YARN-8587: - DistributedShell uses YARN v1 API, which doesn't support more fine-grained status distinction between container-executor running vs docker running. If docker run failed due to invalid parameters supplied by distributed shell, it may take up to a minute to fail the container because the delay happens in heart beat interval to report the status to AM and RM. The recommendation is to update the test case to use yarn container -list [appId] to shorten the time to check container running status from RM, but not completely eliminate possible network delay in container status report. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560026#comment-16560026 ] Eric Yang commented on YARN-8587: - [~yeshavora] YARN state machine transition from SCHEDULED to RUNNING then run container-executor with the docker run command. SSH command tests happen in parallel while container-executor launch docker run command. This may report incorrect result from v1 API when container-executor and docker run command take more time to start. We might want to query for container sub-state RUNNING_BUT_NOT_READY from YARN service REST API (or yarn app -status [appname]) to determine if docker run command is actually started. Only run docker ps -a after container sub-state has changed to RUNNING. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558907#comment-16558907 ] Eric Yang commented on YARN-8587: - There is backward incompatibility concern with distributed shell, where we allow user to specify multiple unix command and output redirection of log file. For fixing this transient false positive, logging mechanism behavior will change. stderr, stdout will contain command output. stderr.txt and stdout.txt will container more information including command launched, and docker errors. Hence, this can only be fixed if we agree that the incompatible change is negligible. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558720#comment-16558720 ] Eric Yang commented on YARN-8587: - This bug is result of docker run detach reports exit_code 0, but the process inside the container fail to run. For a brief period of time, node manager will report back that container is in RUNNING state, then fail the container later. One possible solution is to change container-executor for non-entry-point mode to become more similar to entry_point mode to run docker run in the foreground, and parent process have a set of retries for docker inspect to obtain PID. This removes the possible false positive reporting of RUNNING state. The synthetic timeout approach may kill container prematurely (or wait longer than necessary for failing container), if container takes more than 30 seconds (or configured values) to start the first process in the container. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org