[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562358#comment-16562358 ] Wangda Tan commented on YARN-8545: -- I think it is important to get it backported to branch-3.1.1, I'm going to do this in a couple of hours, please let me know if you think different. cc: [~csingh], [~eyang] > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559026#comment-16559026 ] Hudson commented on YARN-8545: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14649 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14649/]) YARN-8545. Return allocated resource to RM for failed container. (eyang: rev 40fad32824d2f8f960c779d78357e62103453da0) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstanceEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/containerlaunch/ContainerLaunchService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/MockServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/TestComponent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/instance/TestComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe,
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558997#comment-16558997 ] Eric Yang commented on YARN-8545: - +1 looks good to me. Committing shortly. > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558835#comment-16558835 ] Chandni Singh commented on YARN-8545: - [~billie.rinaldi] [~eyang] Do you have any comments on patch 1? > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556267#comment-16556267 ] Gour Saha commented on YARN-8545: - [~csingh] patch 001 looks good to me. +1. > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553500#comment-16553500 ] genericqa commented on YARN-8545: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 26s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8545 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932779/YARN-8545.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 56cd137fb41c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 17e2616 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21348/testReport/ | | Max. process+thread count | 755 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21348/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > YARN native service
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553422#comment-16553422 ] Chandni Singh commented on YARN-8545: - [~gsaha] [~billie.rinaldi] could you please review the patch? > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org