Hi, I am running a job(wordcount example) on 3 node cluster(1 master and 2 slave), some times the job passes but some times it fails(as reduce fails, input data few kbs). I am not able to nail down the reason of this inconsistency.
failed log: 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at / 20.0.1.206:8032 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process : 1 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397454060494_0003 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application application_1397454060494_0003 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job: http://20.0.1.206:8088/proxy/application_1397454060494_0003/ 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running in uber mode : false 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0% 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0% 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13% 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25% 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38% 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50% 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000003_0, Status : FAILED 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000001_0, Status : FAILED 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000005_0, Status : FAILED 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000007_0, Status : FAILED 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63% 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75% 14/04/14 11:58:09 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000003_1, Status : FAILED 14/04/14 11:58:11 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000005_1, Status : FAILED 14/04/14 11:58:24 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000003_2, Status : FAILED 14/04/14 11:58:26 INFO mapreduce.Job: Task Id : attempt_1397454060494_0003_r_000005_2, Status : FAILED 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100% 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed with state FAILED due to: Task failed task_1397454060494_0003_r_000003 Job failed as tasks failed. failedMaps:0 failedReduces:1 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51 File System Counters FILE: Number of bytes read=80 FILE: Number of bytes written=596766 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=175 HDFS: Number of bytes written=28 HDFS: Number of read operations=21 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Failed reduce tasks=9 Killed reduce tasks=1 Launched map tasks=1 Launched reduce tasks=16 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3477 Total time spent by all reduces in occupied slots (ms)=148867 Total time spent by all map tasks (ms)=3477 Total time spent by all reduce tasks (ms)=148867 Total vcore-seconds taken by all map tasks=3477 Total vcore-seconds taken by all reduce tasks=148867 Total megabyte-seconds taken by all map tasks=3560448 Total megabyte-seconds taken by all reduce tasks=152439808 Map-Reduce Framework Map input records=3 Map output records=13 Map output bytes=110 Map output materialized bytes=112 Input split bytes=117 Combine input records=13 Combine output records=6 Reduce input groups=4 Reduce shuffle bytes=80 Reduce input records=4 Reduce output records=4 Spilled Records=10 Shuffled Maps =6 Failed Shuffles=0 Merged Map outputs=6 GC time elapsed (ms)=142 CPU time spent (ms)=6420 Physical memory (bytes) snapshot=1100853248 Virtual memory (bytes) snapshot=4468314112 Total committed heap usage (bytes)=1406992384 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=58 File Output Format Counters Bytes Written=28 Job Passing Logs: hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /user/hduser/input /user/hduser/output_wordcount9 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at / 20.0.1.206:8032 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process : 1 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397454060494_0002 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application application_1397454060494_0002 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job: http://20.0.1.206:8088/proxy/application_1397454060494_0002/ 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running in uber mode : false 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0% 14/04/14 11:47:50 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_m_000000_0, Status : FAILED 14/04/14 11:48:05 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_m_000000_1, Status : FAILED 14/04/14 11:48:20 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_m_000000_2, Status : FAILED 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0% 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13% 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25% 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50% 14/04/14 11:48:41 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_r_000001_0, Status : FAILED 14/04/14 11:48:42 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_r_000003_0, Status : FAILED 14/04/14 11:48:43 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_r_000005_0, Status : FAILED 14/04/14 11:48:44 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_r_000007_0, Status : FAILED 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63% 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75% 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88% 14/04/14 11:48:58 INFO mapreduce.Job: Task Id : attempt_1397454060494_0002_r_000005_1, Status : FAILED 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100% 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed successfully 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52 File System Counters FILE: Number of bytes read=112 FILE: Number of bytes written=767175 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=175 HDFS: Number of bytes written=40 HDFS: Number of read operations=27 HDFS: Number of large read operations=0 HDFS: Number of write operations=16 Job Counters Failed map tasks=3 Failed reduce tasks=5 Launched map tasks=4 Launched reduce tasks=13 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=41629 Total time spent by all reduces in occupied slots (ms)=104530 Total time spent by all map tasks (ms)=41629 Total time spent by all reduce tasks (ms)=104530 Total vcore-seconds taken by all map tasks=41629 Total vcore-seconds taken by all reduce tasks=104530 Total megabyte-seconds taken by all map tasks=42628096 Total megabyte-seconds taken by all reduce tasks=107038720 Map-Reduce Framework Map input records=3 Map output records=13 Map output bytes=110 Map output materialized bytes=112 Input split bytes=117 Combine input records=13 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=112 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =8 Failed Shuffles=0 Merged Map outputs=8 GC time elapsed (ms)=186 CPU time spent (ms)=8890 Physical memory (bytes) snapshot=1408913408 Virtual memory (bytes) snapshot=5727019008 Total committed heap usage (bytes)=1808990208 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=58 File Output Format Counters Bytes Written=40 Thanks and Regards, -Rahul Singh
