[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Knut O. Hellan updated YARN-527: -------------------------------- Attachment: yarn-site.xml > Local filecache mkdir fails > --------------------------- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. > Reporter: Knut O. Hellan > Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_000000_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira