[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601466#comment-14601466 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2185 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2185/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601448#comment-14601448 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #237 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/237/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601412#comment-14601412 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #228 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/228/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601372#comment-14601372 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2167 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2167/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601073#comment-14601073 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Yarn-trunk #969 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/969/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601054#comment-14601054 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #239 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/239/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599723#comment-14599723 ] Hudson commented on YARN-3832: -- FAILURE: Integrated in Hadoop-trunk-Commit #8058 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8058/]) YARN-3832. Resource Localization fails on a cluster due to existing cache directories. Contributed by Brahma Reddy Battula (jlowe: rev 8d58512d6e6d9fe93784a9de2af0056bcc316d96) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599713#comment-14599713 ] Brahma Reddy Battula commented on YARN-3832: [~jlowe]thanks a lot for review and commit.. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599694#comment-14599694 ] Jason Lowe commented on YARN-3832: -- +1 lgtm. Committing this. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599103#comment-14599103 ] Hadoop QA commented on YARN-3832: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 7s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 44m 16s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741464/YARN-3832.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2ba6465 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8331/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8331/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8331/console | This message was automatically generated. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599025#comment-14599025 ] Brahma Reddy Battula commented on YARN-3832: [~jlowe] Attached the patch to address to cleanup full dir's also..And thanks for your pointer's..Checked manually,deletion is happening for full dir's.. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > Attachments: YARN-3832.patch > > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597873#comment-14597873 ] Jason Lowe commented on YARN-3832: -- Ah, I think that might be the clue as to what went wrong. If the NM recreated the state store on startup then ResourceLocalizationService will try to cleanup the localized resources to prevent them from getting out of sync with the state store. Unfortunately the code does this: {code} private void cleanUpLocalDirs(FileContext lfs, DeletionService del) { for (String localDir : dirsHandler.getLocalDirs()) { cleanUpLocalDir(lfs, del, localDir); } {code} It should be calling dirsHandler.getLocalDirsForCleanup, since getLocalDirs will not include any disks that are full. Since the disk was too full, it probably wasn't in the list of local dirs and therefore we avoided cleaning up the localized resources on the disk. Later when the disk became good it tried to use it, but at that point the state store and localized resources on that disk are out of sync and new localizations can collide with old ones. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597805#comment-14597805 ] Brahma Reddy Battula commented on YARN-3832: [~jlowe] Sorry for late reply...After look into logs,,Came to know that *disk declared bad ( since it's reached 90%) and nodes became unhealthy* {noformat} 2015-06-19 04:39:18,498 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /opt/hdfsdata/HA/nmlocal error, used space above threshold of 90.0%, removing from list of valid directories 2015-06-19 04:39:18,498 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /opt/hdfsdata/HA/nmlog error, used space above threshold of 90.0%, removing from list of valid directories 2015-06-19 04:39:18,498 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed: 1/1 local-dirs are bad: /opt/hdfsdata/HA/nmlocal; 1/1 log-dirs are bad: /opt/hdfsdata/HA/nmlog 2015-06-19 04:39:18,499 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs are bad: /opt/hdfsdata/HA/nmlocal; 1/1 log-dirs are bad: /opt/hdfsdata/HA/nmlog {noformat} On restart of NM, those disk turn to good.. 2015-06-19 04:47:18,765 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) turned good: 1/1 local-dirs are good: /opt/hdfsdata/HA/nmlocal; 1/1 log-dirs are good: /opt/hdfsdata/HA/nmlog.. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593541#comment-14593541 ] Jason Lowe commented on YARN-3832: -- bq. AFAIK statestore recreated after starting So if the state store was recreated on startup and the build has YARN-2624 fixed, any evidence why the local directories were not removed? Even if there was a deletion error on shutdown, the whole local dirs should have been cleaned up on startup if the state store had to be initialized from scratch. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593497#comment-14593497 ] Brahma Reddy Battula commented on YARN-3832: [~jlowe] Thanks for looking into this issue.. {quote}Can you look back in the NM logs to see when /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 was originally created{quote} I looked into the logs, it was created three days back.And NM was restarted today (days doent matter anyway,just for reference).And disk is not bad and not full. {noformat} 2015-06-16 07:12:05,886 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://hacluster/tmp/hadoop-yarn/staging/root/.staging/job_1434452428753_0004/libjars/netty-all-4.0.23.Final.jar(->/opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39/netty-all-4.0.23.Final.jar) transitioned from DOWNLOADING to LOCALIZED {noformat} While stopping the NM, it thrown the following Error..HADOOP-11878 raised for same.. {noformat} 2015-06-19 03:09:10,528 ERROR org.apache.hadoop.yarn.server.nodemanager.DeletionService: Exception during execution of task in DeletionService java.lang.NullPointerException at org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:274) at org.apache.hadoop.fs.FileContext.delete(FileContext.java:761) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.deleteAsUser(DefaultContainerExecutor.java:458) at org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:293) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} AFAIK statestore recreated after starting ..Hence old state store is out of sync or problem with deleting the cache entries > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593439#comment-14593439 ] Varun Saxena commented on YARN-3832: Sorry meant for [~singar.ranga] > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593432#comment-14593432 ] Varun Saxena commented on YARN-3832: [~ranga swamy], can you attach the logs ? > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593424#comment-14593424 ] Jason Lowe commented on YARN-3832: -- It looks like the state store became out-of-sync with the local filesystem state. Can you look back in the NM logs to see when /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 was originally created? Was the state store re-created or the disk declared bad/full in-between the creation of that directory and the error? Seems like something would have had to go wrong with either storing the state or deleting the cache entry on the local disk for this to occur. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Ranga Swamy >Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following > error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container > for appattempt_1434703279149_0057_02 exited with exitCode: -1000 > For more detailed output, check application tracking > page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, > click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory > /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)