Hi All. Things werid that I found records are spilled into the disk [1] while the map input split bytes is samll[2]. Also the value of the mapreduce.task.io.sort.mb is 100mb by default.
I do think that there should not be any records spilled to the disks.And the memory is sufficient to contain the outputs of maps. So how does the spilled could happen? Any help will be appretiated. Thanks. The version : CDH4.1.2 with MPv2 & Yarn. 2013-04-01 20:13:50,289 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1265)) - Running job: job_local_0001 2013-04-01 20:13:50,294 INFO mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(320)) - OutputCommitter set in config null 2013-04-01 20:13:50,304 INFO mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(338)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 2013-04-01 20:13:50,322 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(386)) - Waiting for map tasks 2013-04-01 20:13:50,324 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000000_0 2013-04-01 20:13:50,375 INFO mapred.Task (Task.java:initialize(565)) - Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@4d480ea 2013-04-01 20:13:50,784 INFO mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) 2013-04-01 20:13:50,785 INFO mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100 2013-04-01 20:13:50,785 INFO mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080 2013-04-01 20:13:50,785 INFO mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600 2013-04-01 20:13:50,785 INFO mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600 2013-04-01 20:13:50,952 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-01 20:13:50,955 INFO mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output 2013-04-01 20:13:50,955 INFO mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output 2013-04-01 20:13:50,955 INFO mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 4449; bufvoid = 104857600 2013-04-01 20:13:50,955 INFO mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214276(104857104); length = 121/6553600 2013-04-01 20:13:50,988 INFO mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0 2013-04-01 20:13:50,992 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000000_0 is done. And is in the process of committing 2013-04-01 20:13:51,016 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map 2013-04-01 20:13:51,016 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000000_0' done. 2013-04-01 20:13:51,017 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000000_0 2013-04-01 20:13:51,017 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete. 2013-04-01 20:13:51,032 INFO mapred.Task (Task.java:initialize(565)) - Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@40363068 2013-04-01 20:13:51,045 INFO mapred.Merger (Merger.java:merge(549)) - Merging 1 sorted segments 2013-04-01 20:13:51,085 INFO mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 1 segments left of total size: 4531 bytes 2013-04-01 20:13:51,085 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-01 20:13:51,109 WARN conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2013-04-01 20:13:51,265 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing 2013-04-01 20:13:51,268 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-01 20:13:51,268 INFO mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-04-01 20:13:51,276 INFO output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/d2/_temporary/0/task_local_0001_r_000000 2013-04-01 20:13:51,277 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce 2013-04-01 20:13:51,277 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done. 2013-04-01 20:13:51,293 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false 2013-04-01 20:13:51,294 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1293)) - map 100% reduce 100% 2013-04-01 20:13:52,301 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully 2013-04-01 20:13:52,317 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32 File System Counters FILE: Number of bytes read=56420 FILE: Number of bytes written=379066 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=8728 HDFS: Number of bytes written=4426 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=31 Map output records=31 Map output bytes=4449 Map output materialized bytes=4540 Input split bytes=131 [1] Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=0 Reduce input records=31 Reduce output records=31 Spilled Records=62 [2] Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=0 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=366346240 File Input Format Counters Bytes Read=4364 File Output Format Counters Bytes Written=4426
