Sorry, just saw that you said you explicitly did that.

Ok, basically what's happening with the GC is that when you do something
like "group x by id" Pig will try to load *all the tuples of x with a given
id* into memory in the reducer, unless algebraic or accumulative
optimizations were applied (this depends on what exactly you are doing with
results of grouping).  Same for joins.  A common pitfall is grouping by a
tuple: group foo by (a, b), and then flattening out the group manually:
foreach grouped_data generate group.a as a, group.b as b;  Instead, you
should use FLATTEN to get optimizations to kick in: foreach grouped_data
generate FLATTEN(group) as (a, b);

Doing things like "group all_http_requests by status_code" is bad, because
that puts a very large number of records into very few buckets. It's ok if
all you are doing is counting the results, but bad if you want to do
something like sort them by timestamp.



On Fri, Oct 8, 2010 at 10:40 AM, Dmitriy Ryaboy <[email protected]> wrote:

> When you changed to using replicated joins, did you put the small relation
> last in your list? The order is important...
>
> -D
>
>
> On Fri, Oct 8, 2010 at 8:47 AM, Vincent <[email protected]>wrote:
>
>>
>>
>>
>>  *I've tried mapred.child.java.opts value 2048m*. Now the error is a
>>  timeout. Seems like system is so loaded that it becomes irresponsive...
>>
>> Here are the outputs of the job tracker:
>>
>>
>>  Hadoop job_201010081840_0010
>>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
>>  failures on prog7 <http://prog7.lan:50030/jobtracker.jsp>
>>
>> Attempt         Task    Machine         State   Error   Logs
>> attempt_201010081840_0010_r_000001_0    task_201010081840_0010_r_000001 <
>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
>>      prog7 <http://prog7:50060>      FAILED
>>
>> Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
>> seconds. Killing!
>>
>>        Last 4KB <
>> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
>> >
>> Last 8KB <
>> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
>> >
>> All <
>> http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>
>>
>>
>>
>>  ask Logs: 'attempt_201010081840_0010_r_000001_0'
>>
>>
>>
>> *_stdout logs_*
>> ------------------------------------------------------------------------
>>
>>
>> *_stderr logs_*
>> ------------------------------------------------------------------------
>>
>>
>> *_syslog logs_*
>>
>> 2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=SHUFFLE, sessionId=
>> 2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
>> ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
>> 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
>> on-disk files
>> 2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
>> on-disk files
>> 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
>> memory files
>> 2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
>> is already in progress
>> 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
>> Completion Events
>> 2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
>> 2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
>> len: 18158862
>> 2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18158862 bytes (18158866 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000002_0
>> 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
>> len: 20624283
>> 2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 20624283 bytes (20624287 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000000_0
>> 2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
>> 2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
>> 2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000000_0 ->  (105, 265) from prog7
>> 2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
>> dup hosts)
>> 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
>> len: 18485336
>> 2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18485336 bytes (18485340 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000001_0
>> 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
>> 2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000002_0 ->  (177, 148) from hermitage
>> 2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
>> len: 18075616
>> 2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18075616 bytes (18075620 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000003_0
>> 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
>> 2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000001_0 ->  (241, 162) from prog7
>> 2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
>> 2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000003_0 ->  (189, 187) from hermitage
>> 2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
>> len: 18255979
>> 2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18255979 bytes (18255983 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000006_0
>> 2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
>> 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
>> len: 18874525
>> 2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18874525 bytes (18874529 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000004_0
>> 2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
>> dup hosts)
>> 2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
>> 2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000004_0 ->  (105, 133) from prog7
>> 2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
>> 2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000006_0 ->  (105, 178) from hermitage
>> 2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
>> len: 18358508
>> 2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18358508 bytes (18358512 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000007_0
>> 2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
>> 2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000007_0 ->  (105, 166) from hermitage
>> 2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
>> len: 18092003
>> 2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18092003 bytes (18092007 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000008_0
>> 2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
>> 2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000008_0 ->  (293, 232) from hermitage
>> 2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
>> len: 17941905
>> 2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 17941905 bytes (17941909 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000009_0
>> 2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
>> 2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
>> dup hosts)
>> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
>> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000009_0 ->  (105, 197) from hermitage
>> 2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
>> len: 18405138
>> 2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18405138 bytes (18405142 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000010_0
>> 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
>> len: 18009092
>> 2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18009092 bytes (18009096 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000005_0
>> 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
>> 2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000005_0 ->  (105, 206) from prog7
>> 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
>> 2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000010_0 ->  (137, 175) from hermitage
>> 2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
>> len: 20002821
>> 2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 20002821 bytes (20002825 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000011_0
>> 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
>> 2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000011_0 ->  (105, 143) from hermitage
>> 2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
>> len: 18135955
>> 2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18135955 bytes (18135959 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000012_0
>> 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
>> 2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000012_0 ->  (137, 149) from hermitage
>> 2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
>> len: 18440782
>> 2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18440782 bytes (18440786 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000013_0
>> 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
>> 2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000013_0 ->  (137, 142) from hermitage
>> 2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
>> len: 18205881
>> 2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18205881 bytes (18205885 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000014_0
>> 2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
>> 2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000014_0 ->  (253, 159) from hermitage
>> 2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
>> len: 17476258
>> 2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 17476258 bytes (17476262 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000015_0
>> 2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
>> 2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
>> 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000015_0 ->  (105, 158) from hermitage
>> 2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
>> len: 18542226
>> 2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18542226 bytes (18542230 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000017_0
>> 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
>> 2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000017_0 ->  (257, 151) from hermitage
>> 2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
>> 2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
>> len: 18737336
>> 2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18737336 bytes (18737340 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000018_0
>> 2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
>> 2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000018_0 ->  (253, 175) from hermitage
>> 2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
>> 2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
>> 2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
>> len: 17710254
>> 2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 17710254 bytes (17710258 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000020_0
>> 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
>> len: 20708572
>> 2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 20708572 bytes (20708576 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000016_0
>> 2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
>> 2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000016_0 ->  (297, 318) from prog7
>> 2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
>> dup hosts)
>> 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
>> len: 18984483
>> 2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18984483 bytes (18984487 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000019_0
>> 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
>> 2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000019_0 ->  (285, 160) from prog7
>> 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
>> 2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000020_0 ->  (105, 127) from hermitage
>> 2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
>> len: 18803709
>> 2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18803709 bytes (18803713 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000021_0
>> 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
>> 2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000021_0 ->  (137, 164) from hermitage
>> 2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
>> 2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
>> len: 18143864
>> 2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 18143864 bytes (18143868 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000022_1
>> 2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
>> 2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000022_1 ->  (105, 133) from hermitage
>> 2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
>> dup hosts)
>> 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
>> attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
>> len: 9198815
>> 2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 9198815 bytes (9198819 raw bytes) into RAM from
>> attempt_201010081840_0010_m_000023_1
>> 2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
>> obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
>> 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> 9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
>> 2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
>> from attempt_201010081840_0010_m_000023_1 ->  (137, 166) from hermitage
>> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
>> GetMapEventsThread exiting
>> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
>> getMapsEventsThread joined.
>> 2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
>> ram manager
>> 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
>> Interleaved on-disk merge complete: 0 files left.
>> 2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
>> In-memory merge complete: 24 files left.
>> 2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
>> sorted segments
>> 2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
>> last merge-pass, with 24 segments left of total size: 436372203 bytes
>> 2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged
>> 24 segments, 436372203 bytes to disk to satisfy reduce memory limit
>> 2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging
>> 1 files, 436372161 bytes from disk
>> 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging
>> 0 segments, 0 bytes from memory into reduce
>> 2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
>> sorted segments
>> 2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
>> last merge-pass, with 1 segments left of total size: 436372157 bytes
>> 2010-10-08 19:13:18,879 INFO
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
>> process : 24
>> 2010-10-08 19:13:18,879 INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
>> paths to process : 24
>> 2010-10-08 19:16:14,354 INFO
>> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
>> (Collection threshold exceeded) init = 32309248(31552K) used =
>> 803560952(784727K) committed = 1069678592(1044608K) max =
>> 1431699456(1398144K)
>>
>>
>>
>>
>> On 10/08/2010 06:44 PM, Vincent wrote:
>>
>>>  Yep, I did restart cluster (dfs and mapred stop/start).
>>>
>>> Increasing the amount of memory I can see that the reduce task goes
>>> further (percentage is greater), but then start to decrease with memory
>>> failures.
>>>
>>> On 10/08/2010 06:41 PM, Jeff Zhang wrote:
>>>
>>>> Did you restart cluster after reconfiguration ?
>>>>
>>>>
>>>> On Fri, Oct 8, 2010 at 9:59 PM, Vincent<[email protected]>
>>>>  wrote:
>>>>
>>>>>  I've tried with mapred.child.java.opts value:
>>>>> -Xmx512m -->  still memory errors in reduce phase
>>>>> -Xmx1024m -->  still memory errors in reduce phase
>>>>> I am now trying with -Xmx1536m but I'm afraid that my nodes will start
>>>>> to
>>>>> swap memory...
>>>>>
>>>>> Should I continue in this direction? Or it's already to much and I
>>>>> should
>>>>> search the problem somewhere else?
>>>>>
>>>>> Thanks
>>>>>
>>>>> -Vincent
>>>>>
>>>>>
>>>>> On 10/08/2010 03:04 PM, Jeff Zhang wrote:
>>>>>
>>>>>> Try to increase the heap size on of task by setting
>>>>>> mapred.child.java.opts in mapred-site.xml. The default value is
>>>>>> -Xmx200m in mapred-default.xml which may be too small for you.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 8, 2010 at 6:55 PM, Vincent<[email protected]>
>>>>>>  wrote:
>>>>>>
>>>>>>>
>>>>>>>  Thanks to Dmitriy and Jeff, I've set :
>>>>>>>
>>>>>>> set default_parallel 20; at the beginning of my script.
>>>>>>>
>>>>>>> Updated 8 JOINs to behave like:
>>>>>>>
>>>>>>> JOIN big BY id, small BY id USING 'replicated';
>>>>>>>
>>>>>>> Unfortunately this didn't improve the script speed (at least it runs
>>>>>>> for
>>>>>>> more than one hour now).
>>>>>>>
>>>>>>> But Looking in the jobtracker one of the job which reduce, I can see
>>>>>>> for
>>>>>>> the
>>>>>>> map:
>>>>>>>
>>>>>>>
>>>>>>>  Hadoop map task list for job_201010081314_0010
>>>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
>>>>>>>    on
>>>>>>>  prog7<http://prog7.lan:50030/jobtracker.jsp>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   All Tasks
>>>>>>>
>>>>>>> Task    Complete        Status  Start Time      Finish Time
>>>>>>> Errors
>>>>>>>  Counters
>>>>>>> task_201010081314_0010_m_000000
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>>>>>>
>>>>>>>      100.00%
>>>>>>>
>>>>>>>
>>>>>>>        8-Oct-2010 14:07:44
>>>>>>>        8-Oct-2010 14:23:11 (15mins, 27sec)
>>>>>>>
>>>>>>>
>>>>>>> Too many fetch-failures
>>>>>>> Too many fetch-failures
>>>>>>>
>>>>>>>
>>>>>>>        8
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And I can see this for the reduce
>>>>>>>
>>>>>>>
>>>>>>>  Hadoop reduce task list for job_201010081314_0010
>>>>>>> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
>>>>>>>    on
>>>>>>>  prog7<http://prog7.lan:50030/jobtracker.jsp>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   All Tasks
>>>>>>>
>>>>>>> Task    Complete        Status  Start Time      Finish Time
>>>>>>> Errors
>>>>>>>  Counters
>>>>>>> task_201010081314_0010_r_000000
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>>>>>>>
>>>>>>>      9.72%
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        reduce>    copy (7 of 24 at 0.01 MB/s)>
>>>>>>>        8-Oct-2010 14:14:49
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error: GC overhead limit exceeded
>>>>>>>
>>>>>>>
>>>>>>>        7
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>>>>>>>
>>>>>>> task_201010081314_0010_r_000001
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>>>>>>>
>>>>>>>      0.00%
>>>>>>>
>>>>>>>
>>>>>>>        8-Oct-2010 14:14:52
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error: Java heap space
>>>>>>>
>>>>>>>
>>>>>>>        0
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>>>>>>>
>>>>>>> task_201010081314_0010_r_000002
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>>>>>>>
>>>>>>>      0.00%
>>>>>>>
>>>>>>>
>>>>>>>        8-Oct-2010 14:15:58
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        0
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>>>>>>>
>>>>>>> task_201010081314_0010_r_000003
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>>>>>>>
>>>>>>>      9.72%
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        reduce>    copy (7 of 24 at 0.01 MB/s)>
>>>>>>>        8-Oct-2010 14:16:58
>>>>>>>
>>>>>>>
>>>>>>>        7
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>>>>>>>
>>>>>>> task_201010081314_0010_r_000004
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>>>>>>>
>>>>>>>      0.00%
>>>>>>>
>>>>>>>
>>>>>>>        8-Oct-2010 14:18:11
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error: GC overhead limit exceeded
>>>>>>>
>>>>>>>
>>>>>>>        0
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>>>>>>>
>>>>>>> task_201010081314_0010_r_000005
>>>>>>>
>>>>>>> <
>>>>>>> http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
>>>>>>>
>>>>>>>      0.00%
>>>>>>>
>>>>>>>
>>>>>>>        8-Oct-2010 14:18:56
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error: GC overhead limit exceeded
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Seems like it runs out of memory... Which parameter should be
>>>>>>> increased?
>>>>>>>
>>>>>>> -Vincent
>>>>>>>
>>>>>>>
>>>>>>> On 10/08/2010 01:12 PM, Jeff Zhang wrote:
>>>>>>>
>>>>>>>> BTW, you can look at the job tracker web ui to see which part of the
>>>>>>>> job cost the most of the time
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>> No I mean whether your mapreduce job's reduce task number is 1.
>>>>>>>>>
>>>>>>>>> And could you share your pig script, then others can really
>>>>>>>>> understand
>>>>>>>>> your problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]
>>>>>>>>> >
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>>  You are right, I didn't change this parameter, therefore the
>>>>>>>>>> default
>>>>>>>>>> is
>>>>>>>>>> used from src/mapred/mapred-default.xml
>>>>>>>>>>
>>>>>>>>>> <property>
>>>>>>>>>> <name>mapred.reduce.tasks</name>
>>>>>>>>>> <value>1</value>
>>>>>>>>>> <description>The default number of reduce tasks per job. Typically
>>>>>>>>>> set
>>>>>>>>>> to
>>>>>>>>>> 99%
>>>>>>>>>>  of the cluster's reduce capacity, so that if a node fails the
>>>>>>>>>> reduces
>>>>>>>>>> can
>>>>>>>>>>  still be executed in a single wave.
>>>>>>>>>>  Ignored when mapred.job.tracker is "local".
>>>>>>>>>> </description>
>>>>>>>>>> </property>
>>>>>>>>>>
>>>>>>>>>> Not clear for me what is the reduce capacity of my cluster :)
>>>>>>>>>>
>>>>>>>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>>>>>>>>>
>>>>>>>>>>> I guess maybe your reduce number is 1 which cause the reduce
>>>>>>>>>>> phase
>>>>>>>>>>> very
>>>>>>>>>>> slowly.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Well I can see from the job tracker that all the jobs are done
>>>>>>>>>>>> quite
>>>>>>>>>>>> quickly expect 2 for which reduce phase goes really really
>>>>>>>>>>>> slowly.
>>>>>>>>>>>>
>>>>>>>>>>>> But how can I make the parallel between a job in the Hadoop jop
>>>>>>>>>>>> tracker
>>>>>>>>>>>> (example: job_201010072150_0045) and the Pig script execution?
>>>>>>>>>>>>
>>>>>>>>>>>> And what is the most efficient: several small Pig scripts? or
>>>>>>>>>>>> one
>>>>>>>>>>>> big
>>>>>>>>>>>> Pig
>>>>>>>>>>>> script? I did one big to avoid to load several time the same
>>>>>>>>>>>> logs in
>>>>>>>>>>>> different scripts. Maybe it is not so good design...
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> - Vincent
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/08/2010 11:31 AM, Vincent wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>  I'm using pig-0.7.0 on hadoop-0.20.2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the script, well it's more then 500 lines, I'm not sure if
>>>>>>>>>>>>> I
>>>>>>>>>>>>> post
>>>>>>>>>>>>> it
>>>>>>>>>>>>> here that somebody will read it till the end :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> What version of Pig, and what does your script look like?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM,
>>>>>>>>>>>>>> Vincent<[email protected]>
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> laugh.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> hour
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> pig
>>>>>>>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2
>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> servers but simple computers:
>>>>>>>>>>>>>>> - Intel core 2 duo with 3GB of RAM.
>>>>>>>>>>>>>>> - Intel Quad with 4GB of RAM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well I was aware that hadoop has overhead and that it won't
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> done
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> half
>>>>>>>>>>>>>>> an hour (time in local divided by number of nodes). But I was
>>>>>>>>>>>>>>> surprised
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> see this morning it took 7 hours to complete!!!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My configuration was made according to this link:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My question is simple: Is it normal?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vincent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  --
>>>>>>>>> Best Regards
>>>>>>>>>
>>>>>>>>> Jeff Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to