Oups! I suppose you've pointed out my mistakes.
I am doing a lot a bad operations like you've mentioned :

- group foo by (a, b), and then flattening out the group manually: foreach grouped_data generate group.a as a, group.b as b;
- group all_http_requests by status_code.

One more question and I will leave you for the week end: :-)

Running the script locally with Pig, I am loading many small (< 3MB) log files as inputs (288 logs for each input precisely). Running the script on hadoop, before copying the logs on the HDFS, I am concatenating all them by input types. As I've read that HDFS doesn't like small files. It makes an input is more than 1GB and couple of others are several hundreds of MB.

Is it ok? I've let the HDFS block size by default (64MB).

And many thanks for your help to both of you Jeff and Dmitriy! I've learned a lot today thanks to you.

-Vincent

On 10/08/2010 09:47 PM, Dmitriy Ryaboy wrote:
Sorry, just saw that you said you explicitly did that.

Ok, basically what's happening with the GC is that when you do something
like "group x by id" Pig will try to load *all the tuples of x with a given
id* into memory in the reducer, unless algebraic or accumulative
optimizations were applied (this depends on what exactly you are doing with
results of grouping).  Same for joins.  A common pitfall is grouping by a
tuple: group foo by (a, b), and then flattening out the group manually:
foreach grouped_data generate group.a as a, group.b as b;  Instead, you
should use FLATTEN to get optimizations to kick in: foreach grouped_data
generate FLATTEN(group) as (a, b);

Doing things like "group all_http_requests by status_code" is bad, because
that puts a very large number of records into very few buckets. It's ok if
all you are doing is counting the results, but bad if you want to do
something like sort them by timestamp.



On Fri, Oct 8, 2010 at 10:40 AM, Dmitriy Ryaboy<[email protected]>  wrote:

When you changed to using replicated joins, did you put the small relation
last in your list? The order is important...

-D


On Fri, Oct 8, 2010 at 8:47 AM, Vincent<[email protected]>wrote:



  *I've tried mapred.child.java.opts value 2048m*. Now the error is a
  timeout. Seems like system is so loaded that it becomes irresponsive...

Here are the outputs of the job tracker:


  Hadoop job_201010081840_0010
  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
  failures on prog7<http://prog7.lan:50030/jobtracker.jsp>

Attempt         Task    Machine         State   Error   Logs
attempt_201010081840_0010_r_000001_0    task_201010081840_0010_r_000001<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
      prog7<http://prog7:50060>       FAILED

Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
seconds. Killing!

        Last 4KB<
http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
Last 8KB<
http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
All<
http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



  ask Logs: 'attempt_201010081840_0010_r_000001_0'



*_stdout logs_*
------------------------------------------------------------------------


*_stderr logs_*
------------------------------------------------------------------------


*_syslog logs_*

2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
on-disk files
2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
on-disk files
2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
memory files
2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
is already in progress
2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
Completion Events
2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
len: 18158862
2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18158862 bytes (18158866 raw bytes) into RAM from
attempt_201010081840_0010_m_000002_0
2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
len: 20624283
2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 20624283 bytes (20624287 raw bytes) into RAM from
attempt_201010081840_0010_m_000000_0
2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000000_0 ->   (105, 265) from prog7
2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
dup hosts)
2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
len: 18485336
2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18485336 bytes (18485340 raw bytes) into RAM from
attempt_201010081840_0010_m_000001_0
2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000002_0 ->   (177, 148) from hermitage
2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
len: 18075616
2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18075616 bytes (18075620 raw bytes) into RAM from
attempt_201010081840_0010_m_000003_0
2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000001_0 ->   (241, 162) from prog7
2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000003_0 ->   (189, 187) from hermitage
2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
len: 18255979
2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18255979 bytes (18255983 raw bytes) into RAM from
attempt_201010081840_0010_m_000006_0
2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
len: 18874525
2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18874525 bytes (18874529 raw bytes) into RAM from
attempt_201010081840_0010_m_000004_0
2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
dup hosts)
2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000004_0 ->   (105, 133) from prog7
2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000006_0 ->   (105, 178) from hermitage
2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
len: 18358508
2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18358508 bytes (18358512 raw bytes) into RAM from
attempt_201010081840_0010_m_000007_0
2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000007_0 ->   (105, 166) from hermitage
2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
len: 18092003
2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18092003 bytes (18092007 raw bytes) into RAM from
attempt_201010081840_0010_m_000008_0
2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000008_0 ->   (293, 232) from hermitage
2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
len: 17941905
2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 17941905 bytes (17941909 raw bytes) into RAM from
attempt_201010081840_0010_m_000009_0
2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
dup hosts)
2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000009_0 ->   (105, 197) from hermitage
2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
len: 18405138
2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18405138 bytes (18405142 raw bytes) into RAM from
attempt_201010081840_0010_m_000010_0
2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
len: 18009092
2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18009092 bytes (18009096 raw bytes) into RAM from
attempt_201010081840_0010_m_000005_0
2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000005_0 ->   (105, 206) from prog7
2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000010_0 ->   (137, 175) from hermitage
2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
len: 20002821
2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 20002821 bytes (20002825 raw bytes) into RAM from
attempt_201010081840_0010_m_000011_0
2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000011_0 ->   (105, 143) from hermitage
2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
len: 18135955
2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18135955 bytes (18135959 raw bytes) into RAM from
attempt_201010081840_0010_m_000012_0
2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000012_0 ->   (137, 149) from hermitage
2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
len: 18440782
2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18440782 bytes (18440786 raw bytes) into RAM from
attempt_201010081840_0010_m_000013_0
2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000013_0 ->   (137, 142) from hermitage
2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
len: 18205881
2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18205881 bytes (18205885 raw bytes) into RAM from
attempt_201010081840_0010_m_000014_0
2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000014_0 ->   (253, 159) from hermitage
2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
len: 17476258
2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 17476258 bytes (17476262 raw bytes) into RAM from
attempt_201010081840_0010_m_000015_0
2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000015_0 ->   (105, 158) from hermitage
2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
len: 18542226
2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18542226 bytes (18542230 raw bytes) into RAM from
attempt_201010081840_0010_m_000017_0
2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000017_0 ->   (257, 151) from hermitage
2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
len: 18737336
2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18737336 bytes (18737340 raw bytes) into RAM from
attempt_201010081840_0010_m_000018_0
2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000018_0 ->   (253, 175) from hermitage
2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
len: 17710254
2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 17710254 bytes (17710258 raw bytes) into RAM from
attempt_201010081840_0010_m_000020_0
2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
len: 20708572
2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 20708572 bytes (20708576 raw bytes) into RAM from
attempt_201010081840_0010_m_000016_0
2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000016_0 ->   (297, 318) from prog7
2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
dup hosts)
2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
len: 18984483
2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18984483 bytes (18984487 raw bytes) into RAM from
attempt_201010081840_0010_m_000019_0
2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000019_0 ->   (285, 160) from prog7
2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000020_0 ->   (105, 127) from hermitage
2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
len: 18803709
2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18803709 bytes (18803713 raw bytes) into RAM from
attempt_201010081840_0010_m_000021_0
2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000021_0 ->   (137, 164) from hermitage
2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
len: 18143864
2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 18143864 bytes (18143868 raw bytes) into RAM from
attempt_201010081840_0010_m_000022_1
2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000022_1 ->   (105, 133) from hermitage
2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
dup hosts)
2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
len: 9198815
2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 9198815 bytes (9198819 raw bytes) into RAM from
attempt_201010081840_0010_m_000023_1
2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_201010081840_0010_m_000023_1 ->   (137, 166) from hermitage
2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
ram manager
2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 24 files left.
2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
sorted segments
2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 24 segments left of total size: 436372203 bytes
2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged
24 segments, 436372203 bytes to disk to satisfy reduce memory limit
2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging
1 files, 436372161 bytes from disk
2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging
0 segments, 0 bytes from memory into reduce
2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 436372157 bytes
2010-10-08 19:13:18,879 INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
process : 24
2010-10-08 19:13:18,879 INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
paths to process : 24
2010-10-08 19:16:14,354 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
(Collection threshold exceeded) init = 32309248(31552K) used =
803560952(784727K) committed = 1069678592(1044608K) max =
1431699456(1398144K)




On 10/08/2010 06:44 PM, Vincent wrote:

  Yep, I did restart cluster (dfs and mapred stop/start).

Increasing the amount of memory I can see that the reduce task goes
further (percentage is greater), but then start to decrease with memory
failures.

On 10/08/2010 06:41 PM, Jeff Zhang wrote:

Did you restart cluster after reconfiguration ?


On Fri, Oct 8, 2010 at 9:59 PM, Vincent<[email protected]>
  wrote:

  I've tried with mapred.child.java.opts value:
-Xmx512m -->   still memory errors in reduce phase
-Xmx1024m -->   still memory errors in reduce phase
I am now trying with -Xmx1536m but I'm afraid that my nodes will start
to
swap memory...

Should I continue in this direction? Or it's already to much and I
should
search the problem somewhere else?

Thanks

-Vincent


On 10/08/2010 03:04 PM, Jeff Zhang wrote:

Try to increase the heap size on of task by setting
mapred.child.java.opts in mapred-site.xml. The default value is
-Xmx200m in mapred-default.xml which may be too small for you.



On Fri, Oct 8, 2010 at 6:55 PM, Vincent<[email protected]>
  wrote:

  Thanks to Dmitriy and Jeff, I've set :

set default_parallel 20; at the beginning of my script.

Updated 8 JOINs to behave like:

JOIN big BY id, small BY id USING 'replicated';

Unfortunately this didn't improve the script speed (at least it runs
for
more than one hour now).

But Looking in the jobtracker one of the job which reduce, I can see
for
the
map:


  Hadoop map task list for job_201010081314_0010
<http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
  prog7<http://prog7.lan:50030/jobtracker.jsp>

------------------------------------------------------------------------



   All Tasks

Task    Complete        Status  Start Time      Finish Time
Errors
  Counters
task_201010081314_0010_m_000000

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>

      100.00%


        8-Oct-2010 14:07:44
        8-Oct-2010 14:23:11 (15mins, 27sec)


Too many fetch-failures
Too many fetch-failures


        8

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



And I can see this for the reduce


  Hadoop reduce task list for job_201010081314_0010
<http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
  prog7<http://prog7.lan:50030/jobtracker.jsp>

------------------------------------------------------------------------



   All Tasks

Task    Complete        Status  Start Time      Finish Time
Errors
  Counters
task_201010081314_0010_r_000000

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

      9.72%



        reduce>     copy (7 of 24 at 0.01 MB/s)>
        8-Oct-2010 14:14:49



Error: GC overhead limit exceeded


        7

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

task_201010081314_0010_r_000001

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

      0.00%


        8-Oct-2010 14:14:52



Error: Java heap space


        0

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

task_201010081314_0010_r_000002

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

      0.00%


        8-Oct-2010 14:15:58



java.io.IOException: Task process exit with nonzero status of 1.
        at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)



        0

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

task_201010081314_0010_r_000003

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

      9.72%



        reduce>     copy (7 of 24 at 0.01 MB/s)>
        8-Oct-2010 14:16:58


        7

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

task_201010081314_0010_r_000004

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

      0.00%


        8-Oct-2010 14:18:11



Error: GC overhead limit exceeded


        0

<
http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

task_201010081314_0010_r_000005

<
http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>

      0.00%


        8-Oct-2010 14:18:56



Error: GC overhead limit exceeded






Seems like it runs out of memory... Which parameter should be
increased?

-Vincent


On 10/08/2010 01:12 PM, Jeff Zhang wrote:

BTW, you can look at the job tracker web ui to see which part of the
job cost the most of the time



On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>
  wrote:

No I mean whether your mapreduce job's reduce task number is 1.

And could you share your pig script, then others can really
understand
your problem.



On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]
  wrote:

  You are right, I didn't change this parameter, therefore the
default
is
used from src/mapred/mapred-default.xml

<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically
set
to
99%
  of the cluster's reduce capacity, so that if a node fails the
reduces
can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
</description>
</property>

Not clear for me what is the reduce capacity of my cluster :)

On 10/08/2010 01:00 PM, Jeff Zhang wrote:

I guess maybe your reduce number is 1 which cause the reduce
phase
very
slowly.



On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
[email protected]>
  wrote:

  Well I can see from the job tracker that all the jobs are done
quite
quickly expect 2 for which reduce phase goes really really
slowly.

But how can I make the parallel between a job in the Hadoop jop
tracker
(example: job_201010072150_0045) and the Pig script execution?

And what is the most efficient: several small Pig scripts? or
one
big
Pig
script? I did one big to avoid to load several time the same
logs in
different scripts. Maybe it is not so good design...

Thanks for your help.

- Vincent


On 10/08/2010 11:31 AM, Vincent wrote:

  I'm using pig-0.7.0 on hadoop-0.20.2.

For the script, well it's more then 500 lines, I'm not sure if
I
post
it
here that somebody will read it till the end :-)


On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

What version of Pig, and what does your script look like?

On Thu, Oct 7, 2010 at 11:48 PM,
Vincent<[email protected]>
  wrote:

   Hi All,
I'm quite new to Pig/Hadoop. So maybe my cluster size will
make
you
laugh.

I wrote a script on Pig handling 1.5GB of logs in less than
one
hour
in
pig
local mode on a Intel core 2 duo with 3GB of RAM.

Then I tried this script on a simple 2 nodes cluster. These 2
nodes
are
not
servers but simple computers:
- Intel core 2 duo with 3GB of RAM.
- Intel Quad with 4GB of RAM.

Well I was aware that hadoop has overhead and that it won't
be
done
in
half
an hour (time in local divided by number of nodes). But I was
surprised
to
see this morning it took 7 hours to complete!!!

My configuration was made according to this link:






http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

My question is simple: Is it normal?

Cheers


Vincent


  --
Best Regards

Jeff Zhang




Reply via email to