Hi Rahul, What is the log of reduce container ? Please paste the log and we can see the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <[email protected]>wrote: > Hi, > I am running a job(wordcount example) on 3 node cluster(1 master and 2 > slave), some times the job passes but some times it fails(as reduce fails, > input data few kbs). > I am not able to nail down the reason of this inconsistency. > > > > failed log: > > 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at / > 20.0.1.206:8032 > 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process > : 1 > 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1 > 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1397454060494_0003 > 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application > application_1397454060494_0003 > 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job: > http://20.0.1.206:8088/proxy/application_1397454060494_0003/ > 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003 > 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running > in uber mode : false > 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0% > 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0% > 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13% > 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25% > 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38% > 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50% > 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000003_0, Status : FAILED > 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000001_0, Status : FAILED > 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000005_0, Status : FAILED > 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000007_0, Status : FAILED > 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63% > 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75% > 14/04/14 11:58:09 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000003_1, Status : FAILED > 14/04/14 11:58:11 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000005_1, Status : FAILED > 14/04/14 11:58:24 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000003_2, Status : FAILED > 14/04/14 11:58:26 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0003_r_000005_2, Status : FAILED > 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100% > 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed > with state FAILED due to: Task failed task_1397454060494_0003_r_000003 > Job failed as tasks failed. failedMaps:0 failedReduces:1 > > 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51 > File System Counters > FILE: Number of bytes read=80 > FILE: Number of bytes written=596766 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=175 > HDFS: Number of bytes written=28 > HDFS: Number of read operations=21 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=12 > Job Counters > Failed reduce tasks=9 > Killed reduce tasks=1 > Launched map tasks=1 > Launched reduce tasks=16 > Data-local map tasks=1 > Total time spent by all maps in occupied slots (ms)=3477 > Total time spent by all reduces in occupied slots (ms)=148867 > Total time spent by all map tasks (ms)=3477 > Total time spent by all reduce tasks (ms)=148867 > Total vcore-seconds taken by all map tasks=3477 > Total vcore-seconds taken by all reduce tasks=148867 > Total megabyte-seconds taken by all map tasks=3560448 > Total megabyte-seconds taken by all reduce tasks=152439808 > Map-Reduce Framework > Map input records=3 > Map output records=13 > Map output bytes=110 > Map output materialized bytes=112 > Input split bytes=117 > Combine input records=13 > Combine output records=6 > Reduce input groups=4 > Reduce shuffle bytes=80 > Reduce input records=4 > Reduce output records=4 > Spilled Records=10 > Shuffled Maps =6 > Failed Shuffles=0 > Merged Map outputs=6 > GC time elapsed (ms)=142 > CPU time spent (ms)=6420 > Physical memory (bytes) snapshot=1100853248 > Virtual memory (bytes) snapshot=4468314112 > Total committed heap usage (bytes)=1406992384 > Shuffle Errors > BAD_ID=0 > CONNECTION=0 > IO_ERROR=0 > WRONG_LENGTH=0 > WRONG_MAP=0 > WRONG_REDUCE=0 > File Input Format Counters > Bytes Read=58 > File Output Format Counters > Bytes Written=28 > > Job Passing Logs: > hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar > wordcount /user/hduser/input /user/hduser/output_wordcount9 > 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at / > 20.0.1.206:8032 > 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process > : 1 > 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1 > 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1397454060494_0002 > 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application > application_1397454060494_0002 > 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job: > http://20.0.1.206:8088/proxy/application_1397454060494_0002/ > 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002 > 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running > in uber mode : false > 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0% > 14/04/14 11:47:50 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_m_000000_0, Status : FAILED > 14/04/14 11:48:05 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_m_000000_1, Status : FAILED > 14/04/14 11:48:20 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_m_000000_2, Status : FAILED > 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0% > 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13% > 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25% > 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50% > 14/04/14 11:48:41 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_r_000001_0, Status : FAILED > 14/04/14 11:48:42 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_r_000003_0, Status : FAILED > 14/04/14 11:48:43 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_r_000005_0, Status : FAILED > 14/04/14 11:48:44 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_r_000007_0, Status : FAILED > 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63% > 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75% > 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88% > 14/04/14 11:48:58 INFO mapreduce.Job: Task Id : > attempt_1397454060494_0002_r_000005_1, Status : FAILED > 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100% > 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed > successfully > 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52 > File System Counters > FILE: Number of bytes read=112 > FILE: Number of bytes written=767175 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=175 > HDFS: Number of bytes written=40 > HDFS: Number of read operations=27 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=16 > Job Counters > Failed map tasks=3 > Failed reduce tasks=5 > Launched map tasks=4 > Launched reduce tasks=13 > Other local map tasks=3 > Data-local map tasks=1 > Total time spent by all maps in occupied slots (ms)=41629 > Total time spent by all reduces in occupied slots (ms)=104530 > Total time spent by all map tasks (ms)=41629 > Total time spent by all reduce tasks (ms)=104530 > Total vcore-seconds taken by all map tasks=41629 > Total vcore-seconds taken by all reduce tasks=104530 > Total megabyte-seconds taken by all map tasks=42628096 > Total megabyte-seconds taken by all reduce tasks=107038720 > Map-Reduce Framework > Map input records=3 > Map output records=13 > Map output bytes=110 > Map output materialized bytes=112 > Input split bytes=117 > Combine input records=13 > Combine output records=6 > Reduce input groups=6 > Reduce shuffle bytes=112 > Reduce input records=6 > Reduce output records=6 > Spilled Records=12 > Shuffled Maps =8 > Failed Shuffles=0 > Merged Map outputs=8 > GC time elapsed (ms)=186 > CPU time spent (ms)=8890 > Physical memory (bytes) snapshot=1408913408 > Virtual memory (bytes) snapshot=5727019008 > Total committed heap usage (bytes)=1808990208 > Shuffle Errors > BAD_ID=0 > CONNECTION=0 > IO_ERROR=0 > WRONG_LENGTH=0 > WRONG_MAP=0 > WRONG_REDUCE=0 > File Input Format Counters > Bytes Read=58 > File Output Format Counters > Bytes Written=40 > > Thanks and Regards, > -Rahul Singh > -- Regards Gordon Wang
