[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238654#comment-15238654 ] Mahens commented on YARN-2624: -- Issue Still persists in YARN 2.7.1 and HDP version is 2.3. We see this issue intermittently. with same above error. Is there any Hadoop command to clear of cache directory ? > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0, 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627681#comment-14627681 ] Ha Son Hai commented on YARN-2624: -- Hi, YARN 2.6.0.2.2 in HDP version 2.2.4.4-16 has the same problem. Rename cannot overwrite non empty destination directory /hadoop/yarn/local/usercache/yasmina/filecache/12 java.io.IOException: Rename cannot overwrite non empty destination directory /hadoop/yarn/local/usercache/yasmina/filecache/12 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:728) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:236) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:671) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:952) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390319#comment-14390319 ] vishal.rajan commented on YARN-2624: please verify and reopen the jira Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390305#comment-14390305 ] vishal.rajan commented on YARN-2624: seems like this issue still persist in yarn 2.6.0 under certain conditions. Dump of the log relating to this issue. 15/04/01 12:13:20 ERROR test.Job: Task error: Rename cannot overwrite non empty destination directory /grid/6/yarn/local/usercache/azkaban/filecache/344860 java.io.IOException: Rename cannot overwrite non empty destination directory /grid/6/yarn/local/usercache/azkaban/filecache/344860 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:909) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) = yarn version : hadoop-2-2-0-0-2041-yarn 2.6.0.2.2.0.0-2041 = This node was taken OOR for maintanance, and when it was added back to the cluster, seems like this 344860 directory was not removed before assigning it to the new container. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157904#comment-14157904 ] Hudson commented on YARN-2624: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #699 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/699/]) YARN-2624. Resource Localization fails on a cluster due to existing cache directories. Contributed by Anubhav Dhoot (jlowe: rev 29f520052e2b02f44979980e446acc0dccd96d54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158017#comment-14158017 ] Hudson commented on YARN-2624: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1890 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1890/]) YARN-2624. Resource Localization fails on a cluster due to existing cache directories. Contributed by Anubhav Dhoot (jlowe: rev 29f520052e2b02f44979980e446acc0dccd96d54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158066#comment-14158066 ] Hudson commented on YARN-2624: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1915/]) YARN-2624. Resource Localization fails on a cluster due to existing cache directories. Contributed by Anubhav Dhoot (jlowe: rev 29f520052e2b02f44979980e446acc0dccd96d54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156805#comment-14156805 ] Karthik Kambatla commented on YARN-2624: The patch looks good to me. Would like input from someone more familiar with the NM restart code. [~jlowe], [~djp] - can either of you take a look? We would like to get this committed soon. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156824#comment-14156824 ] Jason Lowe commented on YARN-2624: -- Thanks for catching and fixing this, Anubhav! My apologies for missing this scenario in the original JIRA. +1 lgtm. Committing this. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156835#comment-14156835 ] Karthik Kambatla commented on YARN-2624: Thanks for super-quick turnaround, Jason. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156841#comment-14156841 ] Hudson commented on YARN-2624: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6178 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6178/]) YARN-2624. Resource Localization fails on a cluster due to existing cache directories. Contributed by Anubhav Dhoot (jlowe: rev 29f520052e2b02f44979980e446acc0dccd96d54) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156869#comment-14156869 ] Anubhav Dhoot commented on YARN-2624: - Thanks [~jlowe]! Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155820#comment-14155820 ] Hadoop QA commented on YARN-2624: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672456/YARN-2624.001.patch against trunk revision 8dfe54f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5221//console This message is automatically generated. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155913#comment-14155913 ] Hadoop QA commented on YARN-2624: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672461/YARN-2624.001.patch against trunk revision 0708827. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5224//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5224//console This message is automatically generated. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156073#comment-14156073 ] Anubhav Dhoot commented on YARN-2624: - Failure seems unrelated to changes and does not repro locally Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156079#comment-14156079 ] Anubhav Dhoot commented on YARN-2624: - The fix addresses the scenario moving from pre node manager recovery to turning on node manager recovery. As per YARN-1338 the directories are not cleaned up inorder to preserve running containers. But uniqueNumberGenerator will not know about preexisting directories which were normally deleted on NM startup and are unknown to recovery enabled NM. In this case we still want directory cleanup to happen. Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Blocker Attachments: YARN-2624.001.patch, YARN-2624.001.patch We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153749#comment-14153749 ] Anubhav Dhoot commented on YARN-2624: - What we see is a bunch of preexisting local resource cache directories conflict with the new resource download. The destination directory being chosen via uniqueNumberGenerator is choosing one of these and without [HADOOP-9438|https://issues.apache.org/jira/browse/HADOOP-9438] we dont know until the rename fails. Resetting uniqueNumberGenerator based on recoverResource does not seem to be enough. We may need to check the state of the NM's cache directory and reset to the highest number in the directory Resource Localization fails on a cluster due to existing cache directories -- Key: YARN-2624 URL: https://issues.apache.org/jira/browse/YARN-2624 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot We have found resource localization fails on a cluster with following error in certain cases. {noformat} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, 1412027745352, FILE, null },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} java.io.IOException: Rename cannot overwrite non empty destination directory /data/yarn/nm/filecache/27 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)