Thanks Chalcy! But the hadoop cluster should not hang in any way, is that a bug?
On Wed, Mar 6, 2013 at 12:33 PM, Chalcy Raja <[email protected]>wrote: > You could try breaking up the hive query to return smaller datasets. I > have noticed this behavior when the hive query has ‘in’ in where clause.** > ** > > ** ** > > Thanks,**** > > Chalcy**** > > *From:* Daning Wang [mailto:[email protected]] > *Sent:* Wednesday, March 06, 2013 3:08 PM > *To:* [email protected] > *Subject:* Hadoop cluster hangs on big hive job**** > > ** ** > > We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while > running big hive jobs(hive-0.8.1). Basically all the nodes are dead, from > that trasktracker's log looks it went into some kinds of loop forever.**** > > ** ** > > All the log entries like this when problem happened.**** > > ** ** > > Any idea how to debug the issue?**** > > ** ** > > Thanks in advance.**** > > ** ** > > ** ** > > 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000004_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000043_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:27,505 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:28,464 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:28,553 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000043_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:28,561 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:28,659 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:30,519 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:30,644 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:30,741 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:31,369 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000004_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:31,675 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:31,875 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:32,372 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > 2013-03-05 15:13:32,893 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of > 49964 at 0.00 MB/s) > **** > > ** ** > > ** ** >
