Hello,

I'm having an out of memory problem that seems rather weird to me.
Perhaps you can help me.

Here's what I do:

dump = LOAD '/user/accounting/dump_2012-01-05.lst' AS (
ts:chararray,
duid:chararray,
owner:chararray,
hidden:chararray,
lgroup:chararray,
nbfiles:long,
length:long,
replicas:long,
provenance:chararray,
state:chararray,
campaign:chararray,
rlength:long,
rnbfiles:long,
rowner:chararray,
rgroup:chararray,
rarchived:chararray,
rsuspicious:chararray,
name:chararray,
ami:chararray,
site:chararray
);

wset = FOREACH dump GENERATE site, duid, replicas, nbfiles, rnbfiles, length, rlength;

bySite = GROUP wset BY site;

report = FOREACH bySite {
  duids = DISTINCT wset.duid;
GENERATE group, COUNT(duids), SUM(wset.replicas), SUM(wset.nbfiles), SUM(wset.rnbfiles), SUM(wset.length), SUM(wset.rlength);
};

STORE report INTO 'testfile.out';




So far, nothing special. The dump file has about 5GB with ~500 million lines. The whole STORE process takes about 2 minutes until it ends up at the last reducer,
which dies like this:

2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
the native-hadoop library
2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask: 
ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk 
files
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk 
files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in 
memory files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is 
already in progress
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map 
Completion Events
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup 
hosts)
.....
2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
the native-hadoop library
2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask: 
ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk 
files
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk 
files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in 
memory files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is 
already in progress
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map 
Completion Events
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:45:48,091 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 
dup hosts)
2012-01-18 22:45:48,294 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and14 
dup hosts)
2012-01-18 22:45:48,336 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 
dup hosts)
2012-01-18 22:45:48,368 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 
dup hosts)
2012-01-18 22:45:48,592 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 
dup hosts)
2012-01-18 22:45:48,636 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and12 
dup hosts)
2012-01-18 22:45:48,774 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 
dup hosts)
2012-01-18 22:45:48,796 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and11 
dup hosts)
2012-01-18 22:45:48,827 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup 
hosts)
2012-01-18 22:45:48,848 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup 
hosts)
2012-01-18 22:45:48,874 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup 
hosts)
2012-01-18 22:45:49,041 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup 
hosts)
2012-01-18 22:45:49,129 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:49,250 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:49,461 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:49,466 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup 
hosts)
2012-01-18 22:45:49,668 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup 
hosts)
2012-01-18 22:45:49,801 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup 
hosts)
2012-01-18 22:45:49,940 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup 
hosts)
2012-01-18 22:45:50,100 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup 
hosts)
2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup 
hosts)
2012-01-18 22:45:50,125 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup 
hosts)
2012-01-18 22:45:50,345 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup 
hosts)
2012-01-18 22:45:50,388 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup 
hosts)
2012-01-18 22:45:50,649 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup 
hosts)
2012-01-18 22:45:50,671 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:45:55,890 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 10 outputs (0 slow hosts and0 
dup hosts)
2012-01-18 22:45:56,119 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29 
dup hosts)
2012-01-18 22:45:56,262 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 2 outputs (0 slow hosts and28 
dup hosts)
2012-01-18 22:45:56,266 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and33 
dup hosts)
2012-01-18 22:45:56,296 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and31 
dup hosts)
2012-01-18 22:45:56,335 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29 
dup hosts)
2012-01-18 22:45:56,363 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and26 
dup hosts)
2012-01-18 22:45:56,461 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25 
dup hosts)
2012-01-18 22:45:56,465 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and27 
dup hosts)
2012-01-18 22:45:56,643 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 
dup hosts)
2012-01-18 22:45:56,662 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 
dup hosts)
2012-01-18 22:45:56,671 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25 
dup hosts)
2012-01-18 22:45:56,696 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 
dup hosts)
2012-01-18 22:45:56,874 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19 
dup hosts)
2012-01-18 22:45:57,016 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 
dup hosts)
2012-01-18 22:45:57,043 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 
dup hosts)
2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 
dup hosts)
2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 
dup hosts)
2012-01-18 22:45:57,129 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19 
dup hosts)
2012-01-18 22:45:57,207 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17 
dup hosts)
2012-01-18 22:45:57,321 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and16 
dup hosts)
2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 
dup hosts)
2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17 
dup hosts)
2012-01-18 22:45:57,561 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 
dup hosts)
2012-01-18 22:45:57,580 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 
dup hosts)
2012-01-18 22:45:57,584 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 
dup hosts)
2012-01-18 22:45:57,588 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 58 segments...
2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Merging 58 sorted 
segments
2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 58 segments left of total size: 443794431 bytes
2012-01-18 22:45:57,594 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 
dup hosts)
2012-01-18 22:45:57,636 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10 
dup hosts)
2012-01-18 22:45:57,906 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
first memory handler call- Usage threshold init = 263258112(257088K) used = 
513227712(501198K) committed = 626393088(611712K) max = 715849728(699072K)
2012-01-18 22:45:57,956 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup 
hosts)
2012-01-18 22:45:58,034 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10 
dup hosts)
2012-01-18 22:45:58,036 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and7 dup 
hosts)
2012-01-18 22:45:58,125 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup 
hosts)
2012-01-18 22:45:58,179 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup 
hosts)
2012-01-18 22:45:58,505 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
first memory handler call - Collection threshold init = 263258112(257088K) used 
= 599026296(584986K) committed = 715849728(699072K) max = 715849728(699072K)
2012-01-18 22:45:58,628 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup 
hosts)
2012-01-18 22:45:58,640 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup 
hosts)
2012-01-18 22:45:58,715 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup 
hosts)
2012-01-18 22:45:58,780 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:58,893 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup 
hosts)
2012-01-18 22:45:58,945 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup 
hosts)
2012-01-18 22:45:59,022 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup 
hosts)
2012-01-18 22:45:59,295 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:46:03,457 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 89947448 bytes from 2 objects. init = 263258112(257088K) 
used = 497067424(485417K) committed = 715849728(699072K) max = 
715849728(699072K)
2012-01-18 22:46:12,065 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 93882208 bytes from 2 objects. init = 263258112(257088K) 
used = 524713768(512415K) committed = 715849728(699072K) max = 
715849728(699072K)
2012-01-18 22:46:20,631 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 307912016 bytes from 6 objects. init = 
263258112(257088K) used = 537442552(524846K) committed = 715849728(699072K) max 
= 715849728(699072K)
2012-01-18 22:46:24,488 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:46:26,405 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 308408644 bytes from 7 objects. init = 
263258112(257088K) used = 548074616(535229K) committed = 715849728(699072K) max 
= 715849728(699072K)
2012-01-18 22:46:29,331 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:46:30,569 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 226846412 bytes from 5 objects. init = 
263258112(257088K) used = 507240048(495351K) committed = 715849728(699072K) max 
= 715849728(699072K)
2012-01-18 22:46:35,576 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201201170946_0084_r_000000_0 Scheduled 4 outputs (0 slow hosts and0 dup 
hosts)
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: 
GetMapEventsThread exiting
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: 
getMapsEventsThread joined.
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram 
manager
2012-01-18 22:46:36,802 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved 
on-disk merge complete: 0 files left.
2012-01-18 22:46:38,463 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
Spilled an estimate of 278231340 bytes from 7 objects. init = 
263258112(257088K) used = 542068736(529364K) committed = 715849728(699072K) max 
= 715849728(699072K)
2012-01-18 22:46:53,367 FATAL org.apache.hadoop.mapred.Task: 
attempt_201201170946_0084_r_000000_0 : Failed to merge in 
memoryjava.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:454)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
        at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:523)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57)
        at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:187)
        at 
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1050)
        at 
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1371)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:200)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1392)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2691)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2629)

2012-01-18 22:46:53,372 INFO org.apache.hadoop.mapred.ReduceTask: In-memory 
merge complete: 31 files left.


If I omit the COUNT(DISTINCT), it works brilliantly and fast. With the COUNT(DISTINCT) it dies like this.

Now, I don't know where to go from here. I'm running Hadoop and Pig with default settings, except I've increased child.opts to -Xmx1024M (24GB machines) so it would be great if you could tell me what to do,
because I'm stuck.

Thanks,
Mario

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to