according to the stack trace, the algebraic is not being used it says updateTop(Top.java:139) exec(Top.java:116)
On 11/17/11, Dmitriy Ryaboy <[email protected]> wrote: > The top udf does not try to process all data in memory if the algebraic > optimization can be applied. It does need to keep the topn numbers in memory > of course. Can you confirm algebraic mode is used? > > On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[email protected]> > wrote: > >> Hey guys, >> >> >> >> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that >> the >> udf tries to process all data in memory. >> >> Is there a workaround for TOP? Or maybe there is some other way of getting >> top results? I cannot use LIMIT since I need to 5% of data, not a constant >> number of rows. >> >> >> >> I am using: >> >> Apache Pig version 0.8.1-cdh3u2 (rexported) >> >> >> >> The stack trace is: >> >> [2011-11-16 12:34:55] INFO (CodecPool.java:128) - Got brand-new >> decompressor >> >> [2011-11-16 12:34:55] INFO (Merger.java:473) - Down to the last >> merge-pass, >> with 21 segments left of total size: 2057257173 bytes >> >> [2011-11-16 12:34:55] INFO (SpillableMemoryManager.java:154) - first >> memory >> handler call- Usage threshold init = 175308800(171200K) used = >> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K) >> >> [2011-11-16 12:36:22] INFO (SpillableMemoryManager.java:167) - first >> memory >> handler call - Collection threshold init = 175308800(171200K) used = >> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K) >> >> [2011-11-16 12:37:28] INFO (TaskLogsTruncater.java:69) - Initializing >> logs' >> truncater with mapRetainSize=-1 and reduceRetainSize=-1 >> >> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child : >> java.lang.OutOfMemoryError: Java heap space >> >> at java.util.Arrays.copyOfRange(Arrays.java:3209) >> >> at java.lang.String.<init>(String.java:215) >> >> at >> java.io.DataInputStream.readUTF(DataInputStream.java:644) >> >> at >> java.io.DataInputStream.readUTF(DataInputStream.java:547) >> >> at >> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210) >> >> at >> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333) >> >> at >> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251) >> >> at >> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555) >> >> at >> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64) >> >> at >> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach >> edBag.java:237) >> >> at org.apache.pig.builtin.TOP.updateTop(TOP.java:139) >> >> at org.apache.pig.builtin.TOP.exec(TOP.java:116) >> >> at org.apache.pig.builtin.TOP.exec(TOP.java:65) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat >> ors.POUserFunc.getNext(POUserFunc.java:245) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat >> ors.POUserFunc.getNext(POUserFunc.java:287) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat >> ors.POForEach.processPlan(POForEach.java:338) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat >> ors.POForEach.getNext(POForEach.java:290) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator >> .processInput(PhysicalOperator.java:276) >> >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat >> ors.POForEach.getNext(POForEach.java:240) >> >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re >> duce.runPipeline(PigMapReduce.java:434) >> >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re >> duce.processOnePackageOutput(PigMapReduce.java:402) >> >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re >> duce.reduce(PigMapReduce.java:382) >> >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re >> duce.reduce(PigMapReduce.java:251) >> >> at >> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >> >> at >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572) >> >> at >> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) >> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >> >> at java.security.AccessController.doPrivileged(Native >> Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:396) >> >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja >> va:1127) >> >> at org.apache.hadoop.mapred.Child.main(Child.java:264) >> >> >> >> >> >> >> >> stderr logs >> >> Exception in thread "Low Memory Detector" java.lang.OutOfMemoryError: Java >> heap space >> >> at >> sun.management.MemoryNotifInfoCompositeData.getCompositeData(MemoryNotifInfo >> CompositeData.java:42) >> >> at >> sun.management.MemoryNotifInfoCompositeData.toCompositeData(MemoryNotifInfoC >> ompositeData.java:36) >> >> at >> sun.management.MemoryImpl.createNotification(MemoryImpl.java:168) >> >> at >> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl. >> java:300) >> >> at sun.management.Sensor.trigger(Sensor.java:120) >> >> >> >> >> >> Thanks in advance! >> >
