Hi Stan, I'd check the NN audit logs for the file /user/apache/.staging/ job_201211150255_237458/job.xml to see when/who deleted it away, perhaps that would give more insight.
On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <[email protected]>wrote: > Hi, > > Any ideas why a staging directory would suddenly become unavailable > after the completion of the map phase but before the start of the > reduce phase? We noticed a sporadic failure yesterday wherein all the > map tasks completed > successfully and all the reduce tasks failed. Upon examining task > tracker logs, the following exception stack trace was revealed: > > 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker: > Error initializing attempt_201211150255_237458_r_000108_1: > java.io.FileNotFoundException: File does not exist: > > hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352) > at > org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434) > at > org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505) > > This problem doesn't seem relevant to only a specific distribution, > but for completeness we are running CDH3u3. > > Thanks! > > stan > -- Harsh J
