Hey Dmitriy,

I attached the script. It is not a plain-pig script, because I make
some preprocessing before submitting it to cluster, but the general
idea of what I submit is clear.

Thanks in advance!

On Fri, Nov 18, 2011 at 12:07 AM, Dmitriy Ryaboy <[email protected]> wrote:
> Ok, so it's something in the rest of the script that's causing this to
> happen. Ruslan, if you send your script, I can probably figure out why
> (usually, it's using another, non-agebraic udf in your foreach, or for
> pig 0.8, generating a constant in the foreach).
>
> D
>
> On Thu, Nov 17, 2011 at 9:59 AM, pablomar
> <[email protected]> wrote:
>> according to the stack trace, the algebraic is not being used
>> it says
>> updateTop(Top.java:139)
>> exec(Top.java:116)
>>
>> On 11/17/11, Dmitriy Ryaboy <[email protected]> wrote:
>>> The top udf does not try to process all data in memory if the algebraic
>>> optimization can be applied. It does need to keep the topn numbers in memory
>>> of course. Can you confirm algebraic mode is used?
>>>
>>> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[email protected]>
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>>
>>>>
>>>> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that
>>>> the
>>>> udf tries to process all data in memory.
>>>>
>>>> Is there a workaround for TOP? Or maybe there is some other way of getting
>>>> top results? I cannot use LIMIT since I need to 5% of data, not a constant
>>>> number of rows.
>>>>
>>>>
>>>>
>>>> I am using:
>>>>
>>>> Apache Pig version 0.8.1-cdh3u2 (rexported)
>>>>
>>>>
>>>>
>>>> The stack trace is:
>>>>
>>>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
>>>> decompressor
>>>>
>>>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last
>>>> merge-pass,
>>>> with 21 segments left of total size: 2057257173 bytes
>>>>
>>>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first
>>>> memory
>>>> handler call- Usage threshold init = 175308800(171200K) used =
>>>> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)
>>>>
>>>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first
>>>> memory
>>>> handler call - Collection threshold init = 175308800(171200K) used =
>>>> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)
>>>>
>>>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing
>>>> logs'
>>>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>
>>>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>>>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>>>
>>>>                at java.lang.String.<init>(String.java:215)
>>>>
>>>>                at
>>>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
>>>>
>>>>                at
>>>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>>>>
>>>>                at
>>>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
>>>> edBag.java:237)
>>>>
>>>>                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
>>>>
>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
>>>>
>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>>>> ors.POUserFunc.getNext(POUserFunc.java:245)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>>>> ors.POUserFunc.getNext(POUserFunc.java:287)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>>>> ors.POForEach.processPlan(POForEach.java:338)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>>>> ors.POForEach.getNext(POForEach.java:290)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
>>>> .processInput(PhysicalOperator.java:276)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>>>> ors.POForEach.getNext(POForEach.java:240)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>>>> duce.runPipeline(PigMapReduce.java:434)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>>>> duce.processOnePackageOutput(PigMapReduce.java:402)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>>>> duce.reduce(PigMapReduce.java:382)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>>>> duce.reduce(PigMapReduce.java:251)
>>>>
>>>>                at
>>>> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>>
>>>>                at
>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
>>>>
>>>>                at
>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>>>>
>>>>                at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>>>
>>>>                at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>>
>>>>                at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>
>>>>                at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
>>>> va:1127)
>>>>
>>>>                at org.apache.hadoop.mapred.Child.main(Child.java:264)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> stderr logs
>>>>
>>>> Exception in thread "Low Memory Detector" java.lang.OutOfMemoryError: Java
>>>> heap space
>>>>
>>>>                at
>>>> sun.management.MemoryNotifInfoCompositeData.getCompositeData(MemoryNotifInfo
>>>> CompositeData.java:42)
>>>>
>>>>                at
>>>> sun.management.MemoryNotifInfoCompositeData.toCompositeData(MemoryNotifInfoC
>>>> ompositeData.java:36)
>>>>
>>>>                at
>>>> sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)
>>>>
>>>>                at
>>>> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.
>>>> java:300)
>>>>
>>>>                at sun.management.Sensor.trigger(Sensor.java:120)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>
>>
>



-- 
Best Regards,
Ruslan Al-Fakikh

Reply via email to