Hi, all I have encounter a weird problem, I got a MR job which would always failed if there are large number of input file(e.g. 400 input files), but always succeed if there is only a little input files(e.g. 20 input files).
In this job , the map phase would read all the input files and interpret each of them as a set of record, the intermediate output of mapper is <record.type, record>, and reducer just write record with same type to same file by using a MultipleSequenceFileOutputFormat. according to the running status attached below, I have found that all the reducer has been failed, and the error is EOFException, which make me more confused. Is there any suggestion to fix this? ----- Hadoop job_201209191629_0013 on node10<http://localhost:50030/jobtracker.jsp> *User:* root *Job Name:* localClustering.jar *Job File:* hdfs://node10:9000/mnt/md5/mapred/system/job_201209191629_0013/job.xml<http://localhost:50030/jobconf.jsp?jobid=job_201209191629_0013> *Job Setup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=setup&pagenum=1&state=completed> *Status:* Failed *Started at:* Thu Sep 20 21:55:11 CST 2012 *Failed at:* Thu Sep 20 22:03:51 CST 2012 *Failed in:* 8mins, 40sec *Job Cleanup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=cleanup&pagenum=1&state=completed> ------------------------------ Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed Task Attempts<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013> map<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1> 100.00%400 00400<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1&state=completed> 00 / 8<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=map&cause=killed> reduce<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1> 100.00% 70007<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1&state=killed> 19<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=failed> / 7<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=killed> CounterMapReduce TotalJob CountersLaunched reduce tasks0026Rack-local map tasks0 01Launched map tasks00408Data-local map tasks0 0 407Failed reduce tasks001FileSystemCountersHDFS_BYTES_READ 899,202,3420899,202,342 FILE_BYTES_WRITTEN742,195,9520742,195,952HDFS_BYTES_WRITTEN 1,038,9600 1,038,960Map-Reduce FrameworkCombine output records00 0Map input records4000 400Spilled Records992,1240 992,124Map output bytes738,140,2560738,140,256Map input bytes567,520,400 0567,520,400Map output records992,1240992,124Combine input records0 00 task_201209191629_0013_r_000000 <http://localhost:50030/taskdetails.jsp?jobid=job_201209191629_0013&tipid=task_201209191629_0013_r_000000>node6 <http://node6:50060/> FAILED java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) *syslog logs* ReduceTask: Read 267038 bytes from map-output for attempt_201209191629_0013_m_000392_0 2012-09-21 05:57:25,249 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201209191629_0013_m_000392_0 -> (15, 729) from node6 2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2012-09-21 05:57:25,436 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1 files left. 2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 72 files left. 2012-09-21 05:57:25,446 INFO org.apache.hadoop.mapred.Merger: Merging 72 sorted segments 2012-09-21 05:57:25,447 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 42 segments left of total size: 22125176 bytes 2012-09-21 05:57:25,755 INFO org.apache.hadoop.mapred.ReduceTask: Merged 72 segments, 22125236 bytes to disk to satisfy reduce memory limit 2012-09-21 05:57:25,757 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2 files, 108299192 bytes from disk 2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments 2012-09-21 05:57:25,764 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 108299184 bytes 2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2683295125469062550_13791 2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2048430611271251978_13803 2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4739785392963375165_13815 2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6981138506714889098_13819 2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6981138506714889098_13819 bad datanode[0] nodes == null 2012-09-21 05:57:53,754 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/work/icc/intermediateoutput/lc/RR/_temporary/_attempt_201209191629_0013_r_000000_0/RR-LC-022033-1" - Aborting... 2012-09-21 05:57:54,539 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2012-09-21 05:57:54,542 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task -- YANG, Lin
