When you changed to using replicated joins, did you put the small relation
last in your list? The order is important...

-D

On Fri, Oct 8, 2010 at 8:47 AM, Vincent <[email protected]> wrote:

>
>
>
>  *I've tried mapred.child.java.opts value 2048m*. Now the error is a
>  timeout. Seems like system is so loaded that it becomes irresponsive...
>
> Here are the outputs of the job tracker:
>
>
>  Hadoop job_201010081840_0010
>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
>  failures on prog7 <http://prog7.lan:50030/jobtracker.jsp>
>
> Attempt         Task    Machine         State   Error   Logs
> attempt_201010081840_0010_r_000001_0    task_201010081840_0010_r_000001 <
> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
>      prog7 <http://prog7:50060>      FAILED
>
> Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
> seconds. Killing!
>
>        Last 4KB <
> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
> >
> Last 8KB <
> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
> >
> All <
> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>
>
>
>
>  ask Logs: 'attempt_201010081840_0010_r_000001_0'
>
>
>
> *_stdout logs_*
> ------------------------------------------------------------------------
>
>
> *_stderr logs_*
> ------------------------------------------------------------------------
>
>
> *_syslog logs_*
>
> 2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=SHUFFLE, sessionId=
> 2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
> ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
> 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
> on-disk files
> 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
> on-disk files
> 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
> memory files
> 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
> is already in progress
> 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
> Completion Events
> 2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
> 2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
> len: 18158862
> 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18158862 bytes (18158866 raw bytes) into RAM from
> attempt_201010081840_0010_m_000002_0
> 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
> len: 20624283
> 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 20624283 bytes (20624287 raw bytes) into RAM from
> attempt_201010081840_0010_m_000000_0
> 2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
> 2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
> 2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000000_0 ->  (105, 265) from prog7
> 2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
> dup hosts)
> 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
> len: 18485336
> 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18485336 bytes (18485340 raw bytes) into RAM from
> attempt_201010081840_0010_m_000001_0
> 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
> 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000002_0 ->  (177, 148) from hermitage
> 2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
> len: 18075616
> 2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18075616 bytes (18075620 raw bytes) into RAM from
> attempt_201010081840_0010_m_000003_0
> 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
> 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000001_0 ->  (241, 162) from prog7
> 2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
> 2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000003_0 ->  (189, 187) from hermitage
> 2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
> len: 18255979
> 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18255979 bytes (18255983 raw bytes) into RAM from
> attempt_201010081840_0010_m_000006_0
> 2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
> 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
> len: 18874525
> 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18874525 bytes (18874529 raw bytes) into RAM from
> attempt_201010081840_0010_m_000004_0
> 2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
> dup hosts)
> 2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
> 2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000004_0 ->  (105, 133) from prog7
> 2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
> 2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000006_0 ->  (105, 178) from hermitage
> 2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
> len: 18358508
> 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18358508 bytes (18358512 raw bytes) into RAM from
> attempt_201010081840_0010_m_000007_0
> 2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
> 2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000007_0 ->  (105, 166) from hermitage
> 2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
> len: 18092003
> 2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18092003 bytes (18092007 raw bytes) into RAM from
> attempt_201010081840_0010_m_000008_0
> 2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
> 2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000008_0 ->  (293, 232) from hermitage
> 2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
> len: 17941905
> 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 17941905 bytes (17941909 raw bytes) into RAM from
> attempt_201010081840_0010_m_000009_0
> 2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
> 2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
> dup hosts)
> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000009_0 ->  (105, 197) from hermitage
> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
> len: 18405138
> 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18405138 bytes (18405142 raw bytes) into RAM from
> attempt_201010081840_0010_m_000010_0
> 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
> len: 18009092
> 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18009092 bytes (18009096 raw bytes) into RAM from
> attempt_201010081840_0010_m_000005_0
> 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
> 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000005_0 ->  (105, 206) from prog7
> 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
> 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000010_0 ->  (137, 175) from hermitage
> 2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
> len: 20002821
> 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 20002821 bytes (20002825 raw bytes) into RAM from
> attempt_201010081840_0010_m_000011_0
> 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
> 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000011_0 ->  (105, 143) from hermitage
> 2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
> len: 18135955
> 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18135955 bytes (18135959 raw bytes) into RAM from
> attempt_201010081840_0010_m_000012_0
> 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
> 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000012_0 ->  (137, 149) from hermitage
> 2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
> len: 18440782
> 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18440782 bytes (18440786 raw bytes) into RAM from
> attempt_201010081840_0010_m_000013_0
> 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
> 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000013_0 ->  (137, 142) from hermitage
> 2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
> len: 18205881
> 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18205881 bytes (18205885 raw bytes) into RAM from
> attempt_201010081840_0010_m_000014_0
> 2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
> 2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000014_0 ->  (253, 159) from hermitage
> 2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
> len: 17476258
> 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 17476258 bytes (17476262 raw bytes) into RAM from
> attempt_201010081840_0010_m_000015_0
> 2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
> 2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
> 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000015_0 ->  (105, 158) from hermitage
> 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
> len: 18542226
> 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18542226 bytes (18542230 raw bytes) into RAM from
> attempt_201010081840_0010_m_000017_0
> 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
> 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000017_0 ->  (257, 151) from hermitage
> 2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
> 2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
> len: 18737336
> 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18737336 bytes (18737340 raw bytes) into RAM from
> attempt_201010081840_0010_m_000018_0
> 2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
> 2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000018_0 ->  (253, 175) from hermitage
> 2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
> 2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
> 2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
> len: 17710254
> 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 17710254 bytes (17710258 raw bytes) into RAM from
> attempt_201010081840_0010_m_000020_0
> 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
> len: 20708572
> 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 20708572 bytes (20708576 raw bytes) into RAM from
> attempt_201010081840_0010_m_000016_0
> 2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
> 2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000016_0 ->  (297, 318) from prog7
> 2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
> dup hosts)
> 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
> len: 18984483
> 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18984483 bytes (18984487 raw bytes) into RAM from
> attempt_201010081840_0010_m_000019_0
> 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
> 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000019_0 ->  (285, 160) from prog7
> 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
> 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000020_0 ->  (105, 127) from hermitage
> 2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
> len: 18803709
> 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18803709 bytes (18803713 raw bytes) into RAM from
> attempt_201010081840_0010_m_000021_0
> 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
> 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000021_0 ->  (137, 164) from hermitage
> 2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
> 2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
> len: 18143864
> 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 18143864 bytes (18143868 raw bytes) into RAM from
> attempt_201010081840_0010_m_000022_1
> 2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
> 2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000022_1 ->  (105, 133) from hermitage
> 2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
> attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
> len: 9198815
> 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
> 9198815 bytes (9198819 raw bytes) into RAM from
> attempt_201010081840_0010_m_000023_1
> 2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
> 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
> 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
> from attempt_201010081840_0010_m_000023_1 ->  (137, 166) from hermitage
> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
> GetMapEventsThread exiting
> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
> getMapsEventsThread joined.
> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
> ram manager
> 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
> Interleaved on-disk merge complete: 0 files left.
> 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
> merge complete: 24 files left.
> 2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
> sorted segments
> 2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 24 segments left of total size: 436372203 bytes
> 2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged 24
> segments, 436372203 bytes to disk to satisfy reduce memory limit
> 2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1
> files, 436372161 bytes from disk
> 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
> segments, 0 bytes from memory into reduce
> 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
> sorted segments
> 2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 1 segments left of total size: 436372157 bytes
> 2010-10-08 19:13:18,879 INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
> process : 24
> 2010-10-08 19:13:18,879 INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
> paths to process : 24
> 2010-10-08 19:16:14,354 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
> (Collection threshold exceeded) init = 32309248(31552K) used =
> 803560952(784727K) committed = 1069678592(1044608K) max =
> 1431699456(1398144K)
>
>
>
>
> On 10/08/2010 06:44 PM, Vincent wrote:
>
>>  Yep, I did restart cluster (dfs and mapred stop/start).
>>
>> Increasing the amount of memory I can see that the reduce task goes
>> further (percentage is greater), but then start to decrease with memory
>> failures.
>>
>> On 10/08/2010 06:41 PM, Jeff Zhang wrote:
>>
>>> Did you restart cluster after reconfiguration ?
>>>
>>>
>>> On Fri, Oct 8, 2010 at 9:59 PM, Vincent<[email protected]>
>>>  wrote:
>>>
>>>>  I've tried with mapred.child.java.opts value:
>>>> -Xmx512m -->  still memory errors in reduce phase
>>>> -Xmx1024m -->  still memory errors in reduce phase
>>>> I am now trying with -Xmx1536m but I'm afraid that my nodes will start
>>>> to
>>>> swap memory...
>>>>
>>>> Should I continue in this direction? Or it's already to much and I
>>>> should
>>>> search the problem somewhere else?
>>>>
>>>> Thanks
>>>>
>>>> -Vincent
>>>>
>>>>
>>>> On 10/08/2010 03:04 PM, Jeff Zhang wrote:
>>>>
>>>>> Try to increase the heap size on of task by setting
>>>>> mapred.child.java.opts in mapred-site.xml. The default value is
>>>>> -Xmx200m in mapred-default.xml which may be too small for you.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 8, 2010 at 6:55 PM, Vincent<[email protected]>
>>>>>  wrote:
>>>>>
>>>>>>
>>>>>>  Thanks to Dmitriy and Jeff, I've set :
>>>>>>
>>>>>> set default_parallel 20; at the beginning of my script.
>>>>>>
>>>>>> Updated 8 JOINs to behave like:
>>>>>>
>>>>>> JOIN big BY id, small BY id USING 'replicated';
>>>>>>
>>>>>> Unfortunately this didn't improve the script speed (at least it runs
>>>>>> for
>>>>>> more than one hour now).
>>>>>>
>>>>>> But Looking in the jobtracker one of the job which reduce, I can see
>>>>>> for
>>>>>> the
>>>>>> map:
>>>>>>
>>>>>>
>>>>>>  Hadoop map task list for job_201010081314_0010
>>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
>>>>>>  on
>>>>>>  prog7<http://prog7.lan:50030/jobtracker.jsp>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>   All Tasks
>>>>>>
>>>>>> Task    Complete        Status  Start Time      Finish Time     Errors
>>>>>>  Counters
>>>>>> task_201010081314_0010_m_000000
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>>>>>
>>>>>>      100.00%
>>>>>>
>>>>>>
>>>>>>        8-Oct-2010 14:07:44
>>>>>>        8-Oct-2010 14:23:11 (15mins, 27sec)
>>>>>>
>>>>>>
>>>>>> Too many fetch-failures
>>>>>> Too many fetch-failures
>>>>>>
>>>>>>
>>>>>>        8
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>>>>>
>>>>>>
>>>>>>
>>>>>> And I can see this for the reduce
>>>>>>
>>>>>>
>>>>>>  Hadoop reduce task list for job_201010081314_0010
>>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
>>>>>>  on
>>>>>>  prog7<http://prog7.lan:50030/jobtracker.jsp>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>   All Tasks
>>>>>>
>>>>>> Task    Complete        Status  Start Time      Finish Time     Errors
>>>>>>  Counters
>>>>>> task_201010081314_0010_r_000000
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>>>>>>
>>>>>>      9.72%
>>>>>>
>>>>>>
>>>>>>
>>>>>>        reduce>    copy (7 of 24 at 0.01 MB/s)>
>>>>>>        8-Oct-2010 14:14:49
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error: GC overhead limit exceeded
>>>>>>
>>>>>>
>>>>>>        7
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>>>>>>
>>>>>> task_201010081314_0010_r_000001
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>>>>>>
>>>>>>      0.00%
>>>>>>
>>>>>>
>>>>>>        8-Oct-2010 14:14:52
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error: Java heap space
>>>>>>
>>>>>>
>>>>>>        0
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>>>>>>
>>>>>> task_201010081314_0010_r_000002
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>>>>>>
>>>>>>      0.00%
>>>>>>
>>>>>>
>>>>>>        8-Oct-2010 14:15:58
>>>>>>
>>>>>>
>>>>>>
>>>>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>>>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>>>>>
>>>>>>
>>>>>>
>>>>>>        0
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>>>>>>
>>>>>> task_201010081314_0010_r_000003
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>>>>>>
>>>>>>      9.72%
>>>>>>
>>>>>>
>>>>>>
>>>>>>        reduce>    copy (7 of 24 at 0.01 MB/s)>
>>>>>>        8-Oct-2010 14:16:58
>>>>>>
>>>>>>
>>>>>>        7
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>>>>>>
>>>>>> task_201010081314_0010_r_000004
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>>>>>>
>>>>>>      0.00%
>>>>>>
>>>>>>
>>>>>>        8-Oct-2010 14:18:11
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error: GC overhead limit exceeded
>>>>>>
>>>>>>
>>>>>>        0
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>>>>>>
>>>>>> task_201010081314_0010_r_000005
>>>>>>
>>>>>> <
>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
>>>>>>
>>>>>>      0.00%
>>>>>>
>>>>>>
>>>>>>        8-Oct-2010 14:18:56
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error: GC overhead limit exceeded
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Seems like it runs out of memory... Which parameter should be
>>>>>> increased?
>>>>>>
>>>>>> -Vincent
>>>>>>
>>>>>>
>>>>>> On 10/08/2010 01:12 PM, Jeff Zhang wrote:
>>>>>>
>>>>>>> BTW, you can look at the job tracker web ui to see which part of the
>>>>>>> job cost the most of the time
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>> No I mean whether your mapreduce job's reduce task number is 1.
>>>>>>>>
>>>>>>>> And could you share your pig script, then others can really
>>>>>>>> understand
>>>>>>>> your problem.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>>  You are right, I didn't change this parameter, therefore the
>>>>>>>>> default
>>>>>>>>> is
>>>>>>>>> used from src/mapred/mapred-default.xml
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>> <name>mapred.reduce.tasks</name>
>>>>>>>>> <value>1</value>
>>>>>>>>> <description>The default number of reduce tasks per job. Typically
>>>>>>>>> set
>>>>>>>>> to
>>>>>>>>> 99%
>>>>>>>>>  of the cluster's reduce capacity, so that if a node fails the
>>>>>>>>> reduces
>>>>>>>>> can
>>>>>>>>>  still be executed in a single wave.
>>>>>>>>>  Ignored when mapred.job.tracker is "local".
>>>>>>>>> </description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> Not clear for me what is the reduce capacity of my cluster :)
>>>>>>>>>
>>>>>>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>>>>>>>>
>>>>>>>>>> I guess maybe your reduce number is 1 which cause the reduce phase
>>>>>>>>>> very
>>>>>>>>>> slowly.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
>>>>>>>>>> [email protected]>
>>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>>  Well I can see from the job tracker that all the jobs are done
>>>>>>>>>>> quite
>>>>>>>>>>> quickly expect 2 for which reduce phase goes really really
>>>>>>>>>>> slowly.
>>>>>>>>>>>
>>>>>>>>>>> But how can I make the parallel between a job in the Hadoop jop
>>>>>>>>>>> tracker
>>>>>>>>>>> (example: job_201010072150_0045) and the Pig script execution?
>>>>>>>>>>>
>>>>>>>>>>> And what is the most efficient: several small Pig scripts? or one
>>>>>>>>>>> big
>>>>>>>>>>> Pig
>>>>>>>>>>> script? I did one big to avoid to load several time the same logs
>>>>>>>>>>> in
>>>>>>>>>>> different scripts. Maybe it is not so good design...
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>> - Vincent
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/08/2010 11:31 AM, Vincent wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  I'm using pig-0.7.0 on hadoop-0.20.2.
>>>>>>>>>>>>
>>>>>>>>>>>> For the script, well it's more then 500 lines, I'm not sure if I
>>>>>>>>>>>> post
>>>>>>>>>>>> it
>>>>>>>>>>>> here that somebody will read it till the end :-)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> What version of Pig, and what does your script look like?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM,
>>>>>>>>>>>>> Vincent<[email protected]>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> laugh.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> hour
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> pig
>>>>>>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2
>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> servers but simple computers:
>>>>>>>>>>>>>> - Intel core 2 duo with 3GB of RAM.
>>>>>>>>>>>>>> - Intel Quad with 4GB of RAM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well I was aware that hadoop has overhead and that it won't be
>>>>>>>>>>>>>> done
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> half
>>>>>>>>>>>>>> an hour (time in local divided by number of nodes). But I was
>>>>>>>>>>>>>> surprised
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> see this morning it took 7 hours to complete!!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My configuration was made according to this link:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My question is simple: Is it normal?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vincent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Reply via email to