[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024097#comment-16024097 ]
Yang Wang commented on YARN-6630: --------------------------------- Hi, [~jianhe], Could you help to review the patch. We already have a problem, after NM restart, we send a resource localization request while container is running(YARN-1503), then NM will fail because of the following exception. Also, anywhere which use *container.workDir* may cause a NullPointerException. {code} java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) at org.apache.hadoop.fs.Path.<init>(Path.java:175) at org.apache.hadoop.fs.Path.<init>(Path.java:110) ... ... {code} > Container worker dir could not recover when NM restart > ------------------------------------------------------ > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.<init>(Path.java:175) > at org.apache.hadoop.fs.Path.<init>(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org