Hello, I'm having an out of memory problem that seems rather weird to me. Perhaps you can help me.
Here's what I do: dump = LOAD '/user/accounting/dump_2012-01-05.lst' AS ( ts:chararray, duid:chararray, owner:chararray, hidden:chararray, lgroup:chararray, nbfiles:long, length:long, replicas:long, provenance:chararray, state:chararray, campaign:chararray, rlength:long, rnbfiles:long, rowner:chararray, rgroup:chararray, rarchived:chararray, rsuspicious:chararray, name:chararray, ami:chararray, site:chararray );wset = FOREACH dump GENERATE site, duid, replicas, nbfiles, rnbfiles, length, rlength;
bySite = GROUP wset BY site;
report = FOREACH bySite {
duids = DISTINCT wset.duid;
GENERATE group, COUNT(duids), SUM(wset.replicas), SUM(wset.nbfiles),
SUM(wset.rnbfiles), SUM(wset.length), SUM(wset.rlength);
}; STORE report INTO 'testfile.out';So far, nothing special. The dump file has about 5GB with ~500 million lines. The whole STORE process takes about 2 minutes until it ends up at the last reducer,
which dies like this:
2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library
2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk
files
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk
files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in
memory files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is
already in progress
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map
Completion Events
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup
hosts)
.....
2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library
2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk
files
2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk
files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in
memory files
2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is
already in progress
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map
Completion Events
2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:45:48,091 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18
dup hosts)
2012-01-18 22:45:48,294 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and14
dup hosts)
2012-01-18 22:45:48,336 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15
dup hosts)
2012-01-18 22:45:48,368 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13
dup hosts)
2012-01-18 22:45:48,592 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15
dup hosts)
2012-01-18 22:45:48,636 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and12
dup hosts)
2012-01-18 22:45:48,774 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13
dup hosts)
2012-01-18 22:45:48,796 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and11
dup hosts)
2012-01-18 22:45:48,827 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup
hosts)
2012-01-18 22:45:48,848 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup
hosts)
2012-01-18 22:45:48,874 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup
hosts)
2012-01-18 22:45:49,041 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup
hosts)
2012-01-18 22:45:49,129 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:49,250 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:49,461 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:49,466 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup
hosts)
2012-01-18 22:45:49,668 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup
hosts)
2012-01-18 22:45:49,801 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup
hosts)
2012-01-18 22:45:49,940 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup
hosts)
2012-01-18 22:45:50,100 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup
hosts)
2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup
hosts)
2012-01-18 22:45:50,125 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup
hosts)
2012-01-18 22:45:50,345 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup
hosts)
2012-01-18 22:45:50,388 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup
hosts)
2012-01-18 22:45:50,649 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup
hosts)
2012-01-18 22:45:50,671 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:45:55,890 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 10 outputs (0 slow hosts and0
dup hosts)
2012-01-18 22:45:56,119 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29
dup hosts)
2012-01-18 22:45:56,262 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 2 outputs (0 slow hosts and28
dup hosts)
2012-01-18 22:45:56,266 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and33
dup hosts)
2012-01-18 22:45:56,296 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and31
dup hosts)
2012-01-18 22:45:56,335 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29
dup hosts)
2012-01-18 22:45:56,363 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and26
dup hosts)
2012-01-18 22:45:56,461 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25
dup hosts)
2012-01-18 22:45:56,465 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and27
dup hosts)
2012-01-18 22:45:56,643 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23
dup hosts)
2012-01-18 22:45:56,662 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23
dup hosts)
2012-01-18 22:45:56,671 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25
dup hosts)
2012-01-18 22:45:56,696 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23
dup hosts)
2012-01-18 22:45:56,874 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19
dup hosts)
2012-01-18 22:45:57,016 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21
dup hosts)
2012-01-18 22:45:57,043 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18
dup hosts)
2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21
dup hosts)
2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21
dup hosts)
2012-01-18 22:45:57,129 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19
dup hosts)
2012-01-18 22:45:57,207 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17
dup hosts)
2012-01-18 22:45:57,321 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and16
dup hosts)
2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18
dup hosts)
2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17
dup hosts)
2012-01-18 22:45:57,561 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13
dup hosts)
2012-01-18 22:45:57,580 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15
dup hosts)
2012-01-18 22:45:57,584 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13
dup hosts)
2012-01-18 22:45:57,588 INFO org.apache.hadoop.mapred.ReduceTask: Initiating
in-memory merge with 58 segments...
2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Merging 58 sorted
segments
2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 58 segments left of total size: 443794431 bytes
2012-01-18 22:45:57,594 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13
dup hosts)
2012-01-18 22:45:57,636 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10
dup hosts)
2012-01-18 22:45:57,906 INFO org.apache.pig.impl.util.SpillableMemoryManager:
first memory handler call- Usage threshold init = 263258112(257088K) used =
513227712(501198K) committed = 626393088(611712K) max = 715849728(699072K)
2012-01-18 22:45:57,956 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup
hosts)
2012-01-18 22:45:58,034 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10
dup hosts)
2012-01-18 22:45:58,036 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and7 dup
hosts)
2012-01-18 22:45:58,125 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup
hosts)
2012-01-18 22:45:58,179 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup
hosts)
2012-01-18 22:45:58,505 INFO org.apache.pig.impl.util.SpillableMemoryManager:
first memory handler call - Collection threshold init = 263258112(257088K) used
= 599026296(584986K) committed = 715849728(699072K) max = 715849728(699072K)
2012-01-18 22:45:58,628 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup
hosts)
2012-01-18 22:45:58,640 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup
hosts)
2012-01-18 22:45:58,715 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup
hosts)
2012-01-18 22:45:58,780 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:58,893 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup
hosts)
2012-01-18 22:45:58,945 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup
hosts)
2012-01-18 22:45:59,022 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup
hosts)
2012-01-18 22:45:59,295 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:46:03,457 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 89947448 bytes from 2 objects. init = 263258112(257088K)
used = 497067424(485417K) committed = 715849728(699072K) max =
715849728(699072K)
2012-01-18 22:46:12,065 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 93882208 bytes from 2 objects. init = 263258112(257088K)
used = 524713768(512415K) committed = 715849728(699072K) max =
715849728(699072K)
2012-01-18 22:46:20,631 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 307912016 bytes from 6 objects. init =
263258112(257088K) used = 537442552(524846K) committed = 715849728(699072K) max
= 715849728(699072K)
2012-01-18 22:46:24,488 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:46:26,405 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 308408644 bytes from 7 objects. init =
263258112(257088K) used = 548074616(535229K) committed = 715849728(699072K) max
= 715849728(699072K)
2012-01-18 22:46:29,331 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:46:30,569 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 226846412 bytes from 5 objects. init =
263258112(257088K) used = 507240048(495351K) committed = 715849728(699072K) max
= 715849728(699072K)
2012-01-18 22:46:35,576 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201201170946_0084_r_000000_0 Scheduled 4 outputs (0 slow hosts and0 dup
hosts)
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram
manager
2012-01-18 22:46:36,802 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved
on-disk merge complete: 0 files left.
2012-01-18 22:46:38,463 INFO org.apache.pig.impl.util.SpillableMemoryManager:
Spilled an estimate of 278231340 bytes from 7 objects. init =
263258112(257088K) used = 542068736(529364K) committed = 715849728(699072K) max
= 715849728(699072K)
2012-01-18 22:46:53,367 FATAL org.apache.hadoop.mapred.Task:
attempt_201201170946_0084_r_000000_0 : Failed to merge in
memoryjava.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:454)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:523)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57)
at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:187)
at
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1050)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1371)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:200)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1392)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2691)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2629)
2012-01-18 22:46:53,372 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
merge complete: 31 files left.
If I omit the COUNT(DISTINCT), it works brilliantly and fast. With the
COUNT(DISTINCT) it dies like this.
Now, I don't know where to go from here. I'm running Hadoop and Pig with default settings, except I've increased child.opts to -Xmx1024M (24GB machines) so it would be great if you could tell me what to do,
because I'm stuck. Thanks, Mario
smime.p7s
Description: S/MIME Cryptographic Signature
