[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2015-04-01 Thread vishal.rajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishal.rajan updated YARN-2624:
---
 Target Version/s:   (was: 2.6.0)
Affects Version/s: 2.6.0

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2624:
---
 Priority: Blocker  (was: Major)
 Target Version/s: 2.6.0
Affects Version/s: 2.5.1

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker

 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2624:

Attachment: YARN-2624.001.patch

Attaching a patch that cleans up the local resource cache directories when the 
statestore is built up first time. That would take care of cleanup of leftover 
directories when moving from non-work preserving to work preserving in most 
cases. There can still be failures in NM in between creating state and running 
the cleanup.

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2624:

Attachment: YARN-2624.001.patch

No apparent failure in jenkins output. Uploading it again

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-09-30 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2624:

Description: 
We have found resource localization fails on a cluster with following error in 
certain cases.

{noformat}
INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Failed to download rsrc { { 
hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
 1412027745352, FILE, null 
},pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
java.io.IOException: Rename cannot overwrite non empty destination directory 
/data/yarn/nm/filecache/27
at 
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
at 
org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
{noformat}

  was:
We have found resource localization fails on a secure cluster with following 
error in certain cases. This happens at some indeterminate point after which it 
will keep failing until NM is restarted.

{noformat}
INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Failed to download rsrc { { 
hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
 1412027745352, FILE, null 
},pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
java.io.IOException: Rename cannot overwrite non empty destination directory 
/data/yarn/nm/filecache/27
at 
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
at 
org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
{noformat}

Summary: Resource Localization fails on a cluster due to existing cache 
directories  (was: Resource Localization fails on a secure cluster until nm are 
restarted)

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)