Boom! SET pig.exec.nocombiner true;
Now everything takes 5 minutes and finishes successfully. Thanks a lot, William! I'll also try the double group by approach tomorrow, perhaps I can squeeze out a few more minutes. Thanks again and good night, Mario On 12-1-18 23:14 , [email protected] wrote:
Nested DISTINCT is a killer. See https://mail-archives.apache.org/mod_mbox/pig-user/201201.mbox/%3ccakne9z5unw03bk2qbyxnxbsevmlcbvy1se8_7tsdmdmedhk...@mail.gmail.com%3E for a discussion of a simple workaround that worked for me. William F Dowling Senior Technologist Thomson Reuters -----Original Message----- From: Mario Lassnig [mailto:[email protected]] Sent: Wednesday, January 18, 2012 5:07 PM To: [email protected] Subject: 0.9.1 out of memory problem Hello, I'm having an out of memory problem that seems rather weird to me. Perhaps you can help me. Here's what I do: dump = LOAD '/user/accounting/dump_2012-01-05.lst' AS ( ts:chararray, duid:chararray, owner:chararray, hidden:chararray, lgroup:chararray, nbfiles:long, length:long, replicas:long, provenance:chararray, state:chararray, campaign:chararray, rlength:long, rnbfiles:long, rowner:chararray, rgroup:chararray, rarchived:chararray, rsuspicious:chararray, name:chararray, ami:chararray, site:chararray ); wset = FOREACH dump GENERATE site, duid, replicas, nbfiles, rnbfiles, length, rlength; bySite = GROUP wset BY site; report = FOREACH bySite { duids = DISTINCT wset.duid; GENERATE group, COUNT(duids), SUM(wset.replicas), SUM(wset.nbfiles), SUM(wset.rnbfiles), SUM(wset.length), SUM(wset.rlength); }; STORE report INTO 'testfile.out'; So far, nothing special. The dump file has about 5GB with ~500 million lines. The whole STORE process takes about 2 minutes until it ends up at the last reducer, which dies like this: 2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId= 2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600 2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk files 2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk files 2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in memory files 2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is already in progress 2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map Completion Events 2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup hosts) ..... 2012-01-18 22:45:42,461 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2012-01-18 22:45:42,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId= 2012-01-18 22:45:42,976 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600 2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging on-disk files 2012-01-18 22:45:42,982 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread waiting: Thread for merging on-disk files 2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for merging in memory files 2012-01-18 22:45:42,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Need another 89 map output(s) where 0 is already in progress 2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Thread started: Thread for polling Map Completion Events 2012-01-18 22:45:42,984 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:45:47,986 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 5 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:45:48,091 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 dup hosts) 2012-01-18 22:45:48,294 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and14 dup hosts) 2012-01-18 22:45:48,336 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 dup hosts) 2012-01-18 22:45:48,368 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 dup hosts) 2012-01-18 22:45:48,592 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 dup hosts) 2012-01-18 22:45:48,636 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and12 dup hosts) 2012-01-18 22:45:48,774 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 dup hosts) 2012-01-18 22:45:48,796 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and11 dup hosts) 2012-01-18 22:45:48,827 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup hosts) 2012-01-18 22:45:48,848 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup hosts) 2012-01-18 22:45:48,874 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup hosts) 2012-01-18 22:45:49,041 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup hosts) 2012-01-18 22:45:49,129 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:49,250 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:49,461 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:49,466 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup hosts) 2012-01-18 22:45:49,668 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup hosts) 2012-01-18 22:45:49,801 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts) 2012-01-18 22:45:49,940 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts) 2012-01-18 22:45:50,100 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts) 2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:50,101 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup hosts) 2012-01-18 22:45:50,125 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup hosts) 2012-01-18 22:45:50,345 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup hosts) 2012-01-18 22:45:50,388 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup hosts) 2012-01-18 22:45:50,649 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts) 2012-01-18 22:45:50,671 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:45:55,890 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 10 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:45:56,119 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29 dup hosts) 2012-01-18 22:45:56,262 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 2 outputs (0 slow hosts and28 dup hosts) 2012-01-18 22:45:56,266 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and33 dup hosts) 2012-01-18 22:45:56,296 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and31 dup hosts) 2012-01-18 22:45:56,335 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and29 dup hosts) 2012-01-18 22:45:56,363 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and26 dup hosts) 2012-01-18 22:45:56,461 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25 dup hosts) 2012-01-18 22:45:56,465 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and27 dup hosts) 2012-01-18 22:45:56,643 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 dup hosts) 2012-01-18 22:45:56,662 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 dup hosts) 2012-01-18 22:45:56,671 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and25 dup hosts) 2012-01-18 22:45:56,696 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and23 dup hosts) 2012-01-18 22:45:56,874 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19 dup hosts) 2012-01-18 22:45:57,016 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 dup hosts) 2012-01-18 22:45:57,043 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 dup hosts) 2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 dup hosts) 2012-01-18 22:45:57,122 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and21 dup hosts) 2012-01-18 22:45:57,129 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and19 dup hosts) 2012-01-18 22:45:57,207 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17 dup hosts) 2012-01-18 22:45:57,321 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and16 dup hosts) 2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and18 dup hosts) 2012-01-18 22:45:57,460 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and17 dup hosts) 2012-01-18 22:45:57,561 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 dup hosts) 2012-01-18 22:45:57,580 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and15 dup hosts) 2012-01-18 22:45:57,584 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 dup hosts) 2012-01-18 22:45:57,588 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 58 segments... 2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Merging 58 sorted segments 2012-01-18 22:45:57,591 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 58 segments left of total size: 443794431 bytes 2012-01-18 22:45:57,594 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and13 dup hosts) 2012-01-18 22:45:57,636 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10 dup hosts) 2012-01-18 22:45:57,906 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call- Usage threshold init = 263258112(257088K) used = 513227712(501198K) committed = 626393088(611712K) max = 715849728(699072K) 2012-01-18 22:45:57,956 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup hosts) 2012-01-18 22:45:58,034 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and10 dup hosts) 2012-01-18 22:45:58,036 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and7 dup hosts) 2012-01-18 22:45:58,125 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and9 dup hosts) 2012-01-18 22:45:58,179 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup hosts) 2012-01-18 22:45:58,505 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 263258112(257088K) used = 599026296(584986K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:45:58,628 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup hosts) 2012-01-18 22:45:58,640 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts) 2012-01-18 22:45:58,715 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup hosts) 2012-01-18 22:45:58,780 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:58,893 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and3 dup hosts) 2012-01-18 22:45:58,945 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and2 dup hosts) 2012-01-18 22:45:59,022 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts) 2012-01-18 22:45:59,295 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:46:03,457 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 89947448 bytes from 2 objects. init = 263258112(257088K) used = 497067424(485417K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:12,065 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 93882208 bytes from 2 objects. init = 263258112(257088K) used = 524713768(512415K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:20,631 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 307912016 bytes from 6 objects. init = 263258112(257088K) used = 537442552(524846K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:24,488 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:46:26,405 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 308408644 bytes from 7 objects. init = 263258112(257088K) used = 548074616(535229K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:29,331 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:46:30,569 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 226846412 bytes from 5 objects. init = 263258112(257088K) used = 507240048(495351K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:35,576 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201201170946_0084_r_000000_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts) 2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2012-01-18 22:46:36,801 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2012-01-18 22:46:36,802 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left. 2012-01-18 22:46:38,463 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 278231340 bytes from 7 objects. init = 263258112(257088K) used = 542068736(529364K) committed = 715849728(699072K) max = 715849728(699072K) 2012-01-18 22:46:53,367 FATAL org.apache.hadoop.mapred.Task: attempt_201201170946_0084_r_000000_0 : Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:454) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:523) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:187) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1050) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1371) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:200) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1392) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2691) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2629) 2012-01-18 22:46:53,372 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 31 files left. If I omit the COUNT(DISTINCT), it works brilliantly and fast. With the COUNT(DISTINCT) it dies like this. Now, I don't know where to go from here. I'm running Hadoop and Pig with default settings, except I've increased child.opts to -Xmx1024M (24GB machines) so it would be great if you could tell me what to do, because I'm stuck. Thanks, Mario
-- Mario Lassnig CERN PH-ADP-DDM 1-R-022 CH-1211 Geneve 23 [email protected] +41 22 76 71268
smime.p7s
Description: S/MIME Cryptographic Signature
