Hi, We have a complex query that involves several left outer joins resulting in 8 M/R jobs in Hive.During execution of one of the stages ( after three M/R has run) the M/R job fails due to few Reduce tasks failing due to inactivity.
Most of the reduce tasks go through fine ( within 3 mins) but the last one gets stuck for a long time (> 1 hour) and finally after several attempts gets killed due to "failed to report status for 600 seconds. Killing!" What may be causing this issue ? Would hive.script.auto.progress help in this case ? As we are not able to get much information from the log files how may we approach resolving this ? Will tweaking of any specific M/R parameters help ? The task attempt log shows several lines like this before exiting : 2012-08-23 19:17:23,848 INFO ExecReducer: ExecReducer: processing 219000000 rows: used memory = 408582240 2012-08-23 19:17:30,189 INFO ExecReducer: ExecReducer: processing 220000000 rows: used memory = 346110400 2012-08-23 19:17:37,510 INFO ExecReducer: ExecReducer: processing 221000000 rows: used memory = 583913576 2012-08-23 19:17:44,829 INFO ExecReducer: ExecReducer: processing 222000000 rows: used memory = 513071504 2012-08-23 19:17:47,923 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 Here are the reduce task counters: *Map-Reduce Framework*Combine input records0Combine output records0Reduce input groups222,480,335Reduce shuffle bytes7,726,141,897Reduce input records 222,480,335Reduce output records0Spilled Records355,827,191CPU time spent (ms)2,152,160Physical memory (bytes) snapshot1,182,490,624Virtual memory (bytes) snapshot1,694,531,584Total committed heap usage (bytes)990,052,352 The tasktracker log gives a thread dump at that time but no exception. *2012-08-23 20:05:49,319 INFO org.apache.hadoop.mapred.TaskTracker: Process Thread Dump: lost task* *69 active threads* --------------------------- Thanks & Regards Himanish