Wangda Tan created YARN-8545:
--------------------------------
Summary: YARN native service should return container if launch
failed
Key: YARN-8545
URL: https://issues.apache.org/jira/browse/YARN-8545
Project: Hadoop YARN
Issue Type: Task
Reporter: Wangda Tan
In some cases, container launch may fail but container will not be properly
returned to RM.
This could happen when AM trying to prepare container launch context but failed
w/o sending container launch context to NM (Once container launch context is
sent to NM, NM will report failed container to RM).
Exception like:
{code:java}
java.io.FileNotFoundException: File does not exist:
hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
at
org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
at
org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
at
org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
at
org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]