[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462267#comment-16462267 ] genericqa commented on YARN-6630: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 6 new + 186 unchanged - 2 fixed = 192 total (was 188) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-6630 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921723/YARN-6630.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f7630068ff25 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 85381c7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20580/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20580/testReport/ | | Max. process+thread count | 342 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172748#comment-16172748 ] Hadoop QA commented on YARN-6630: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 2 new + 140 unchanged - 0 fixed = 142 total (was 140) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 12s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-6630 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12888010/YARN-6630.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d52acf4a0966 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a9019e1 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/17532/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/17532/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/17532/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/17532/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172698#comment-16172698 ] Yang Wang commented on YARN-6630: - Thanks for your comments, [~djp]. Update the patch and rebase trunk. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6630.001.patch, YARN-6630.002.patch > > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir could not recover and is null, and may > cause some exceptions. > We already have a problem, after NM restart, we send a resource localization > request while container is running(YARN-1503), then NM will fail because of > the following exception. > So, container.workdir always need to be saved in NM state store. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163459#comment-16163459 ] Hadoop QA commented on YARN-6630: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-6630 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6630 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869623/YARN-6630.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/17418/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6630.001.patch > > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir could not recover and is null, and may > cause some exceptions. > We already have a problem, after NM restart, we send a resource localization > request while container is running(YARN-1503), then NM will fail because of > the following exception. > So, container.workdir always need to be saved in NM state store. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163455#comment-16163455 ] Junping Du commented on YARN-6630: -- Sorry for coming late on this. Sounds like the patch doesn't apply to latest trunk branch. [~fly_in_gis], can you please update the patch and rebase to latest trunk? > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6630.001.patch > > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir could not recover and is null, and may > cause some exceptions. > We already have a problem, after NM restart, we send a resource localization > request while container is running(YARN-1503), then NM will fail because of > the following exception. > So, container.workdir always need to be saved in NM state store. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045900#comment-16045900 ] Varun Vasudev commented on YARN-6630: - Assigned issue to [~fly_in_gis] and kicked off another Jenkins run because the earlier links aren't accessible anymore. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6630.001.patch > > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir could not recover and is null, and may > cause some exceptions. > We already have a problem, after NM restart, we send a resource localization > request while container is running(YARN-1503), then NM will fail because of > the following exception. > So, container.workdir always need to be saved in NM state store. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045899#comment-16045899 ] Hadoop QA commented on YARN-6630: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color} | {color:red} YARN-6630 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6630 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869623/YARN-6630.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16180/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6630.001.patch > > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir could not recover and is null, and may > cause some exceptions. > We already have a problem, after NM restart, we send a resource localization > request while container is running(YARN-1503), then NM will fail because of > the following exception. > So, container.workdir always need to be saved in NM state store. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025973#comment-16025973 ] Yang Wang commented on YARN-6630: - When ContainerRetryPolicy is NEVER_RETRY, container.workdir also needs to be saved in NM store. Otherwise, it could not recover and is null after NM restart {quote} We already have a problem, after NM restart, we send a resource localization request while container is running(YARN-1503), then NM will fail because of the following exception. Also, anywhere which use container.workDir may cause a NullPointerException. {code} java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) at org.apache.hadoop.fs.Path.(Path.java:175) at org.apache.hadoop.fs.Path.(Path.java:110) ... ... {code} {quote} > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025955#comment-16025955 ] Feng Yuan commented on YARN-6630: - Maybe I wasn't being clear. Actually i mean the logic {code} if (container.isRetryContextSet()) { context.getNMStateStore().storeContainerWorkDir(containerId, workDir); } {code} is normal,what you should do is set ContainerRetryPolicy in user endpoint. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025952#comment-16025952 ] Yang Wang commented on YARN-6630: - Yes, yarn.nodemanager.recovery.enabled=true and ContainerRetryPolicy= NEVER_RETRY is is not ambivalent. I mean container.workdir always need to be saved in NM state store, has nothing to do with ContainerRetryPolicy. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025741#comment-16025741 ] Feng Yuan commented on YARN-6630: - Hi, wy {code}ContainerRetryPolicy{code} is configuarable,for example if you are using DistributeShell app you can set this by parameter:*--container_retry_policy*. IMO,{code}yarn.nodemanager.recovery.enabled=true{code} and {code}ContainerRetryPolicy= NEVER_RETRY{code} is not ambivalent. I think ContainerRetryPolicy is create to let app control which container should retry which not. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024097#comment-16024097 ] Yang Wang commented on YARN-6630: - Hi, [~jianhe], Could you help to review the patch. We already have a problem, after NM restart, we send a resource localization request while container is running(YARN-1503), then NM will fail because of the following exception. Also, anywhere which use *container.workDir* may cause a NullPointerException. {code} java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) at org.apache.hadoop.fs.Path.(Path.java:175) at org.apache.hadoop.fs.Path.(Path.java:110) ... ... {code} > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022763#comment-16022763 ] Hadoop QA commented on YARN-6630: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 5 new + 178 unchanged - 2 fixed = 183 total (was 180) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6630 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869623/YARN-6630.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 73d90fc9b861 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a62be38 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16006/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16006/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16006/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output |
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022445#comment-16022445 ] Yang Wang commented on YARN-6630: - When yarn.nodemanager.recovery.enabled is true, nm will not clear any workdir. However, container.workDir didn't recover and is null. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1603#comment-1603 ] Feng Yuan commented on YARN-6630: - IMO,by default when nm starts it will clear all workdirs, if we should skip some workdirs those container is recovering? Any ideas? > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > > When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be > saved in NM state store. Then NM restarts, container.workDir is null, and may > cause other exceptions. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org