[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Wang updated YARN-6630: ---------------------------- Description: When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be saved in NM state store. {code:title=ContainerLaunch.java} ... private void recordContainerWorkDir(ContainerId containerId, String workDir) throws IOException{ container.setWorkDir(workDir); if (container.isRetryContextSet()) { context.getNMStateStore().storeContainerWorkDir(containerId, workDir); } } {code} Then NM restarts, container.workDir is null, and may cause other exceptions. {code:title=ContainerImpl.java} static class ResourceLocalizedWhileRunningTransition extends ContainerTransition { ... String linkFile = new Path(container.workDir, link).toString(); ... {code} {code} java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) at org.apache.hadoop.fs.Path.<init>(Path.java:175) at org.apache.hadoop.fs.Path.<init>(Path.java:110) ... ... {code} was: When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be saved in NM state store. Then NM restarts, container.workDir is null, and may cause other exceptions. {code:title=ContainerLaunch.java} ... private void recordContainerWorkDir(ContainerId containerId, String workDir) throws IOException{ container.setWorkDir(workDir); if (container.isRetryContextSet()) { context.getNMStateStore().storeContainerWorkDir(containerId, workDir); } } {code} {code:title=ContainerImpl.java} static class ResourceLocalizedWhileRunningTransition extends ContainerTransition { ... String linkFile = new Path(container.workDir, link).toString(); ... {code} > Container worker dir could not recover when NM restart > ------------------------------------------------------ > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yang Wang > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.<init>(Path.java:175) > at org.apache.hadoop.fs.Path.<init>(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org