When you changed to using replicated joins, did you put the small relation last in your list? The order is important...
-D On Fri, Oct 8, 2010 at 8:47 AM, Vincent <[email protected]> wrote: > > > > *I've tried mapred.child.java.opts value 2048m*. Now the error is a > timeout. Seems like system is so loaded that it becomes irresponsive... > > Here are the outputs of the job tracker: > > > Hadoop job_201010081840_0010 > <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010> > failures on prog7 <http://prog7.lan:50030/jobtracker.jsp> > > Attempt Task Machine State Error Logs > attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001 < > http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001> > prog7 <http://prog7:50060> FAILED > > Task attempt_201010081840_0010_r_000001_0 failed to report status for 601 > seconds. Killing! > > Last 4KB < > http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097 > > > Last 8KB < > http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193 > > > All < > http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0> > > > > ask Logs: 'attempt_201010081840_0010_r_000001_0' > > > > *_stdout logs_* > ------------------------------------------------------------------------ > > > *_stderr logs_* > ------------------------------------------------------------------------ > > > *_syslog logs_* > > 2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=SHUFFLE, sessionId= > 2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask: > ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200 > 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging > on-disk files > 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging > on-disk files > 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in > memory files > 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0 > is already in progress > 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map > Completion Events > 2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs > 2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed > len: 18158862 > 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18158862 bytes (18158866 raw bytes) into RAM from > attempt_201010081840_0010_m_000002_0 > 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed > len: 20624283 > 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 20624283 bytes (20624287 raw bytes) into RAM from > attempt_201010081840_0010_m_000000_0 > 2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs > 2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read > 20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0 > 2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7 > 2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9 > dup hosts) > 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed > len: 18485336 > 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18485336 bytes (18485340 raw bytes) into RAM from > attempt_201010081840_0010_m_000001_0 > 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0 > 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage > 2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed > len: 18075616 > 2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18075616 bytes (18075620 raw bytes) into RAM from > attempt_201010081840_0010_m_000003_0 > 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0 > 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7 > 2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0 > 2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage > 2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed > len: 18255979 > 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18255979 bytes (18255983 raw bytes) into RAM from > attempt_201010081840_0010_m_000006_0 > 2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs > 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed > len: 18874525 > 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18874525 bytes (18874529 raw bytes) into RAM from > attempt_201010081840_0010_m_000004_0 > 2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7 > dup hosts) > 2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0 > 2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7 > 2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0 > 2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage > 2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed > len: 18358508 > 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18358508 bytes (18358512 raw bytes) into RAM from > attempt_201010081840_0010_m_000007_0 > 2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0 > 2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage > 2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed > len: 18092003 > 2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18092003 bytes (18092007 raw bytes) into RAM from > attempt_201010081840_0010_m_000008_0 > 2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0 > 2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage > 2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed > len: 17941905 > 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 17941905 bytes (17941909 raw bytes) into RAM from > attempt_201010081840_0010_m_000009_0 > 2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs > 2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6 > dup hosts) > 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read > 17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0 > 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage > 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed > len: 18405138 > 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18405138 bytes (18405142 raw bytes) into RAM from > attempt_201010081840_0010_m_000010_0 > 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed > len: 18009092 > 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18009092 bytes (18009096 raw bytes) into RAM from > attempt_201010081840_0010_m_000005_0 > 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0 > 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7 > 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0 > 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage > 2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed > len: 20002821 > 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 20002821 bytes (20002825 raw bytes) into RAM from > attempt_201010081840_0010_m_000011_0 > 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read > 20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0 > 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage > 2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed > len: 18135955 > 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18135955 bytes (18135959 raw bytes) into RAM from > attempt_201010081840_0010_m_000012_0 > 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0 > 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage > 2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed > len: 18440782 > 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18440782 bytes (18440786 raw bytes) into RAM from > attempt_201010081840_0010_m_000013_0 > 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0 > 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage > 2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed > len: 18205881 > 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18205881 bytes (18205885 raw bytes) into RAM from > attempt_201010081840_0010_m_000014_0 > 2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0 > 2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage > 2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed > len: 17476258 > 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 17476258 bytes (17476262 raw bytes) into RAM from > attempt_201010081840_0010_m_000015_0 > 2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs > 2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read > 17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0 > 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage > 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed > len: 18542226 > 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18542226 bytes (18542230 raw bytes) into RAM from > attempt_201010081840_0010_m_000017_0 > 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0 > 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage > 2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs > 2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed > len: 18737336 > 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18737336 bytes (18737340 raw bytes) into RAM from > attempt_201010081840_0010_m_000018_0 > 2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0 > 2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage > 2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs > 2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs > 2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed > len: 17710254 > 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 17710254 bytes (17710258 raw bytes) into RAM from > attempt_201010081840_0010_m_000020_0 > 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed > len: 20708572 > 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 20708572 bytes (20708576 raw bytes) into RAM from > attempt_201010081840_0010_m_000016_0 > 2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read > 20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0 > 2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7 > 2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1 > dup hosts) > 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed > len: 18984483 > 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18984483 bytes (18984487 raw bytes) into RAM from > attempt_201010081840_0010_m_000019_0 > 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0 > 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7 > 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read > 17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0 > 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage > 2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed > len: 18803709 > 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18803709 bytes (18803713 raw bytes) into RAM from > attempt_201010081840_0010_m_000021_0 > 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0 > 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage > 2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs > 2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed > len: 18143864 > 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 18143864 bytes (18143868 raw bytes) into RAM from > attempt_201010081840_0010_m_000022_1 > 2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read > 18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1 > 2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage > 2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 > dup hosts) > 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed > len: 9198815 > 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 9198815 bytes (9198819 raw bytes) into RAM from > attempt_201010081840_0010_m_000023_1 > 2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring > obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0' > 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read > 9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1 > 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 > from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage > 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: > GetMapEventsThread exiting > 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: > getMapsEventsThread joined. > 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed > ram manager > 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: > Interleaved on-disk merge complete: 0 files left. > 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: In-memory > merge complete: 24 files left. > 2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24 > sorted segments > 2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 24 segments left of total size: 436372203 bytes > 2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged 24 > segments, 436372203 bytes to disk to satisfy reduce memory limit > 2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 > files, 436372161 bytes from disk > 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 > segments, 0 bytes from memory into reduce > 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1 > sorted segments > 2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 1 segments left of total size: 436372157 bytes > 2010-10-08 19:13:18,879 INFO > org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to > process : 24 > 2010-10-08 19:13:18,879 INFO > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input > paths to process : 24 > 2010-10-08 19:16:14,354 INFO > org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called > (Collection threshold exceeded) init = 32309248(31552K) used = > 803560952(784727K) committed = 1069678592(1044608K) max = > 1431699456(1398144K) > > > > > On 10/08/2010 06:44 PM, Vincent wrote: > >> Yep, I did restart cluster (dfs and mapred stop/start). >> >> Increasing the amount of memory I can see that the reduce task goes >> further (percentage is greater), but then start to decrease with memory >> failures. >> >> On 10/08/2010 06:41 PM, Jeff Zhang wrote: >> >>> Did you restart cluster after reconfiguration ? >>> >>> >>> On Fri, Oct 8, 2010 at 9:59 PM, Vincent<[email protected]> >>> wrote: >>> >>>> I've tried with mapred.child.java.opts value: >>>> -Xmx512m --> still memory errors in reduce phase >>>> -Xmx1024m --> still memory errors in reduce phase >>>> I am now trying with -Xmx1536m but I'm afraid that my nodes will start >>>> to >>>> swap memory... >>>> >>>> Should I continue in this direction? Or it's already to much and I >>>> should >>>> search the problem somewhere else? >>>> >>>> Thanks >>>> >>>> -Vincent >>>> >>>> >>>> On 10/08/2010 03:04 PM, Jeff Zhang wrote: >>>> >>>>> Try to increase the heap size on of task by setting >>>>> mapred.child.java.opts in mapred-site.xml. The default value is >>>>> -Xmx200m in mapred-default.xml which may be too small for you. >>>>> >>>>> >>>>> >>>>> On Fri, Oct 8, 2010 at 6:55 PM, Vincent<[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> Thanks to Dmitriy and Jeff, I've set : >>>>>> >>>>>> set default_parallel 20; at the beginning of my script. >>>>>> >>>>>> Updated 8 JOINs to behave like: >>>>>> >>>>>> JOIN big BY id, small BY id USING 'replicated'; >>>>>> >>>>>> Unfortunately this didn't improve the script speed (at least it runs >>>>>> for >>>>>> more than one hour now). >>>>>> >>>>>> But Looking in the jobtracker one of the job which reduce, I can see >>>>>> for >>>>>> the >>>>>> map: >>>>>> >>>>>> >>>>>> Hadoop map task list for job_201010081314_0010 >>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> >>>>>> on >>>>>> prog7<http://prog7.lan:50030/jobtracker.jsp> >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> All Tasks >>>>>> >>>>>> Task Complete Status Start Time Finish Time Errors >>>>>> Counters >>>>>> task_201010081314_0010_m_000000 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> >>>>>> >>>>>> 100.00% >>>>>> >>>>>> >>>>>> 8-Oct-2010 14:07:44 >>>>>> 8-Oct-2010 14:23:11 (15mins, 27sec) >>>>>> >>>>>> >>>>>> Too many fetch-failures >>>>>> Too many fetch-failures >>>>>> >>>>>> >>>>>> 8 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> >>>>>> >>>>>> >>>>>> >>>>>> And I can see this for the reduce >>>>>> >>>>>> >>>>>> Hadoop reduce task list for job_201010081314_0010 >>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> >>>>>> on >>>>>> prog7<http://prog7.lan:50030/jobtracker.jsp> >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> All Tasks >>>>>> >>>>>> Task Complete Status Start Time Finish Time Errors >>>>>> Counters >>>>>> task_201010081314_0010_r_000000 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> >>>>>> >>>>>> 9.72% >>>>>> >>>>>> >>>>>> >>>>>> reduce> copy (7 of 24 at 0.01 MB/s)> >>>>>> 8-Oct-2010 14:14:49 >>>>>> >>>>>> >>>>>> >>>>>> Error: GC overhead limit exceeded >>>>>> >>>>>> >>>>>> 7 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> >>>>>> >>>>>> task_201010081314_0010_r_000001 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> >>>>>> >>>>>> 0.00% >>>>>> >>>>>> >>>>>> 8-Oct-2010 14:14:52 >>>>>> >>>>>> >>>>>> >>>>>> Error: Java heap space >>>>>> >>>>>> >>>>>> 0 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> >>>>>> >>>>>> task_201010081314_0010_r_000002 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> >>>>>> >>>>>> 0.00% >>>>>> >>>>>> >>>>>> 8-Oct-2010 14:15:58 >>>>>> >>>>>> >>>>>> >>>>>> java.io.IOException: Task process exit with nonzero status of 1. >>>>>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >>>>>> >>>>>> >>>>>> >>>>>> 0 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> >>>>>> >>>>>> task_201010081314_0010_r_000003 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> >>>>>> >>>>>> 9.72% >>>>>> >>>>>> >>>>>> >>>>>> reduce> copy (7 of 24 at 0.01 MB/s)> >>>>>> 8-Oct-2010 14:16:58 >>>>>> >>>>>> >>>>>> 7 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> >>>>>> >>>>>> task_201010081314_0010_r_000004 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> >>>>>> >>>>>> 0.00% >>>>>> >>>>>> >>>>>> 8-Oct-2010 14:18:11 >>>>>> >>>>>> >>>>>> >>>>>> Error: GC overhead limit exceeded >>>>>> >>>>>> >>>>>> 0 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> >>>>>> >>>>>> task_201010081314_0010_r_000005 >>>>>> >>>>>> < >>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005> >>>>>> >>>>>> 0.00% >>>>>> >>>>>> >>>>>> 8-Oct-2010 14:18:56 >>>>>> >>>>>> >>>>>> >>>>>> Error: GC overhead limit exceeded >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Seems like it runs out of memory... Which parameter should be >>>>>> increased? >>>>>> >>>>>> -Vincent >>>>>> >>>>>> >>>>>> On 10/08/2010 01:12 PM, Jeff Zhang wrote: >>>>>> >>>>>>> BTW, you can look at the job tracker web ui to see which part of the >>>>>>> job cost the most of the time >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> No I mean whether your mapreduce job's reduce task number is 1. >>>>>>>> >>>>>>>> And could you share your pig script, then others can really >>>>>>>> understand >>>>>>>> your problem. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> You are right, I didn't change this parameter, therefore the >>>>>>>>> default >>>>>>>>> is >>>>>>>>> used from src/mapred/mapred-default.xml >>>>>>>>> >>>>>>>>> <property> >>>>>>>>> <name>mapred.reduce.tasks</name> >>>>>>>>> <value>1</value> >>>>>>>>> <description>The default number of reduce tasks per job. Typically >>>>>>>>> set >>>>>>>>> to >>>>>>>>> 99% >>>>>>>>> of the cluster's reduce capacity, so that if a node fails the >>>>>>>>> reduces >>>>>>>>> can >>>>>>>>> still be executed in a single wave. >>>>>>>>> Ignored when mapred.job.tracker is "local". >>>>>>>>> </description> >>>>>>>>> </property> >>>>>>>>> >>>>>>>>> Not clear for me what is the reduce capacity of my cluster :) >>>>>>>>> >>>>>>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote: >>>>>>>>> >>>>>>>>>> I guess maybe your reduce number is 1 which cause the reduce phase >>>>>>>>>> very >>>>>>>>>> slowly. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent< >>>>>>>>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Well I can see from the job tracker that all the jobs are done >>>>>>>>>>> quite >>>>>>>>>>> quickly expect 2 for which reduce phase goes really really >>>>>>>>>>> slowly. >>>>>>>>>>> >>>>>>>>>>> But how can I make the parallel between a job in the Hadoop jop >>>>>>>>>>> tracker >>>>>>>>>>> (example: job_201010072150_0045) and the Pig script execution? >>>>>>>>>>> >>>>>>>>>>> And what is the most efficient: several small Pig scripts? or one >>>>>>>>>>> big >>>>>>>>>>> Pig >>>>>>>>>>> script? I did one big to avoid to load several time the same logs >>>>>>>>>>> in >>>>>>>>>>> different scripts. Maybe it is not so good design... >>>>>>>>>>> >>>>>>>>>>> Thanks for your help. >>>>>>>>>>> >>>>>>>>>>> - Vincent >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/08/2010 11:31 AM, Vincent wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm using pig-0.7.0 on hadoop-0.20.2. >>>>>>>>>>>> >>>>>>>>>>>> For the script, well it's more then 500 lines, I'm not sure if I >>>>>>>>>>>> post >>>>>>>>>>>> it >>>>>>>>>>>> here that somebody will read it till the end :-) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote: >>>>>>>>>>>> >>>>>>>>>>>>> What version of Pig, and what does your script look like? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, >>>>>>>>>>>>> Vincent<[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will >>>>>>>>>>>>>> make >>>>>>>>>>>>>> you >>>>>>>>>>>>>> laugh. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than >>>>>>>>>>>>>> one >>>>>>>>>>>>>> hour >>>>>>>>>>>>>> in >>>>>>>>>>>>>> pig >>>>>>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 >>>>>>>>>>>>>> nodes >>>>>>>>>>>>>> are >>>>>>>>>>>>>> not >>>>>>>>>>>>>> servers but simple computers: >>>>>>>>>>>>>> - Intel core 2 duo with 3GB of RAM. >>>>>>>>>>>>>> - Intel Quad with 4GB of RAM. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Well I was aware that hadoop has overhead and that it won't be >>>>>>>>>>>>>> done >>>>>>>>>>>>>> in >>>>>>>>>>>>>> half >>>>>>>>>>>>>> an hour (time in local divided by number of nodes). But I was >>>>>>>>>>>>>> surprised >>>>>>>>>>>>>> to >>>>>>>>>>>>>> see this morning it took 7 hours to complete!!! >>>>>>>>>>>>>> >>>>>>>>>>>>>> My configuration was made according to this link: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >>>>>>>>>>>>>> >>>>>>>>>>>>>> My question is simple: Is it normal? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Vincent >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>> Best Regards >>>>>>>> >>>>>>>> Jeff Zhang >>>>>>>> >>>>>>>> >>>>> >>>> >>> >>> >> >
