The top udf does not try to process all data in memory if the algebraic 
optimization can be applied. It does need to keep the topn numbers in memory of 
course. Can you confirm algebraic mode is used?

On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[email protected]> 
wrote:

> Hey guys,
> 
> 
> 
> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that the
> udf tries to process all data in memory.
> 
> Is there a workaround for TOP? Or maybe there is some other way of getting
> top results? I cannot use LIMIT since I need to 5% of data, not a constant
> number of rows.
> 
> 
> 
> I am using:
> 
> Apache Pig version 0.8.1-cdh3u2 (rexported)
> 
> 
> 
> The stack trace is:
> 
> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
> decompressor
> 
> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last merge-pass,
> with 21 segments left of total size: 2057257173 bytes
> 
> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first memory
> handler call- Usage threshold init = 175308800(171200K) used =
> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)
> 
> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first memory
> handler call - Collection threshold init = 175308800(171200K) used =
> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)
> 
> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing logs'
> truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 
> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
> java.lang.OutOfMemoryError: Java heap space
> 
>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
> 
>                at java.lang.String.<init>(String.java:215)
> 
>                at java.io.DataInputStream.readUTF(DataInputStream.java:644)
> 
>                at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 
>                at
> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)
> 
>                at
> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)
> 
>                at
> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 
>                at
> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
> 
>                at
> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 
>                at
> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
> edBag.java:237)
> 
>                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
> 
>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
> 
>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
> ors.POUserFunc.getNext(POUserFunc.java:245)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
> ors.POUserFunc.getNext(POUserFunc.java:287)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POForEach.processPlan(POForEach.java:338)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POForEach.getNext(POForEach.java:290)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> .processInput(PhysicalOperator.java:276)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POForEach.getNext(POForEach.java:240)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
> duce.runPipeline(PigMapReduce.java:434)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
> duce.processOnePackageOutput(PigMapReduce.java:402)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
> duce.reduce(PigMapReduce.java:382)
> 
>                at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
> duce.reduce(PigMapReduce.java:251)
> 
>                at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> 
>                at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
> 
>                at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
> 
>                at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> 
>                at java.security.AccessController.doPrivileged(Native
> Method)
> 
>                at javax.security.auth.Subject.doAs(Subject.java:396)
> 
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
> va:1127)
> 
>                at org.apache.hadoop.mapred.Child.main(Child.java:264)
> 
> 
> 
> 
> 
> 
> 
> stderr logs
> 
> Exception in thread "Low Memory Detector" java.lang.OutOfMemoryError: Java
> heap space
> 
>                at
> sun.management.MemoryNotifInfoCompositeData.getCompositeData(MemoryNotifInfo
> CompositeData.java:42)
> 
>                at
> sun.management.MemoryNotifInfoCompositeData.toCompositeData(MemoryNotifInfoC
> ompositeData.java:36)
> 
>                at
> sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)
> 
>                at
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.
> java:300)
> 
>                at sun.management.Sensor.trigger(Sensor.java:120)
> 
> 
> 
> 
> 
> Thanks in advance!
> 

Reply via email to