Hey guys,

 

I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that the
udf tries to process all data in memory.

Is there a workaround for TOP? Or maybe there is some other way of getting
top results? I cannot use LIMIT since I need to 5% of data, not a constant
number of rows.

 

I am using:

Apache Pig version 0.8.1-cdh3u2 (rexported)

 

The stack trace is:

[2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
decompressor

[2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last merge-pass,
with 21 segments left of total size: 2057257173 bytes

[2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first memory
handler call- Usage threshold init = 175308800(171200K) used =
373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)

[2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first memory
handler call - Collection threshold init = 175308800(171200K) used =
496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)

[2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1

[2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
java.lang.OutOfMemoryError: Java heap space

                at java.util.Arrays.copyOfRange(Arrays.java:3209)

                at java.lang.String.<init>(String.java:215)

                at java.io.DataInputStream.readUTF(DataInputStream.java:644)

                at java.io.DataInputStream.readUTF(DataInputStream.java:547)

                at
org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)

                at
org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)

                at
org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)

                at
org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)

                at
org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)

                at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
edBag.java:237)

                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)

                at org.apache.pig.builtin.TOP.exec(TOP.java:116)

                at org.apache.pig.builtin.TOP.exec(TOP.java:65)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POUserFunc.getNext(POUserFunc.java:245)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POUserFunc.getNext(POUserFunc.java:287)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.processPlan(POForEach.java:338)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:290)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:276)

                at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:240)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
duce.runPipeline(PigMapReduce.java:434)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
duce.processOnePackageOutput(PigMapReduce.java:402)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
duce.reduce(PigMapReduce.java:382)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
duce.reduce(PigMapReduce.java:251)

                at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

                at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)

                at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)

                at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

                at java.security.AccessController.doPrivileged(Native
Method)

                at javax.security.auth.Subject.doAs(Subject.java:396)

                at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1127)

                at org.apache.hadoop.mapred.Child.main(Child.java:264)

 

 

 

stderr logs

Exception in thread "Low Memory Detector" java.lang.OutOfMemoryError: Java
heap space

                at
sun.management.MemoryNotifInfoCompositeData.getCompositeData(MemoryNotifInfo
CompositeData.java:42)

                at
sun.management.MemoryNotifInfoCompositeData.toCompositeData(MemoryNotifInfoC
ompositeData.java:36)

                at
sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)

                at
sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.
java:300)

                at sun.management.Sensor.trigger(Sensor.java:120)

 

 

Thanks in advance!

Reply via email to