RE: java.lang.OutOfMemoryError when using TOP udf

Ruslan Al-fakikh Mon, 21 Nov 2011 09:10:53 -0800

Ok. Here it is:
https://gist.github.com/1383266


-----Original Message-----
From: Dmitriy Ryaboy [mailto:[email protected]] 
Sent: 21 ноября 2011 г. 20:32
To: [email protected]
Subject: Re: java.lang.OutOfMemoryError when using TOP udf

Ruslan, I think the mailing list is set to reject attachments -- can you
post it as a github gist or something similar, and send a link?

D

On Mon, Nov 21, 2011 at 6:11 AM, Ruslan Al-Fakikh
<[email protected]> wrote:
> Hey Dmitriy,
>
> I attached the script. It is not a plain-pig script, because I make 
> some preprocessing before submitting it to cluster, but the general 
> idea of what I submit is clear.
>
> Thanks in advance!
>
> On Fri, Nov 18, 2011 at 12:07 AM, Dmitriy Ryaboy <[email protected]>
wrote:
>> Ok, so it's something in the rest of the script that's causing this 
>> to happen. Ruslan, if you send your script, I can probably figure out 
>> why (usually, it's using another, non-agebraic udf in your foreach, 
>> or for pig 0.8, generating a constant in the foreach).
>>
>> D
>>
>> On Thu, Nov 17, 2011 at 9:59 AM, pablomar 
>> <[email protected]> wrote:
>>> according to the stack trace, the algebraic is not being used it 
>>> says
>>> updateTop(Top.java:139)
>>> exec(Top.java:116)
>>>
>>> On 11/17/11, Dmitriy Ryaboy <[email protected]> wrote:
>>>> The top udf does not try to process all data in memory if the 
>>>> algebraic optimization can be applied. It does need to keep the 
>>>> topn numbers in memory of course. Can you confirm algebraic mode is
used?
>>>>
>>>> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" 
>>>> <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey guys,
>>>>>
>>>>>
>>>>>
>>>>> I encounter java.lang.OutOfMemoryError when using TOP udf. It 
>>>>> seems that the udf tries to process all data in memory.
>>>>>
>>>>> Is there a workaround for TOP? Or maybe there is some other way of 
>>>>> getting top results? I cannot use LIMIT since I need to 5% of 
>>>>> data, not a constant number of rows.
>>>>>
>>>>>
>>>>>
>>>>> I am using:
>>>>>
>>>>> Apache Pig version 0.8.1-cdh3u2 (rexported)
>>>>>
>>>>>
>>>>>
>>>>> The stack trace is:
>>>>>
>>>>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new 
>>>>> decompressor
>>>>>
>>>>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last 
>>>>> merge-pass, with 21 segments left of total size: 2057257173 bytes
>>>>>
>>>>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - 
>>>>> first memory handler call- Usage threshold init = 
>>>>> 175308800(171200K) used =
>>>>> 373454552(364701K) committed = 524288000(512000K) max = 
>>>>> 524288000(512000K)
>>>>>
>>>>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - 
>>>>> first memory handler call - Collection threshold init = 
>>>>> 175308800(171200K) used =
>>>>> 496500704(484863K) committed = 524288000(512000K) max = 
>>>>> 524288000(512000K)
>>>>>
>>>>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - 
>>>>> Initializing logs'
>>>>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>>
>>>>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>>>>
>>>>>                at java.lang.String.<init>(String.java:215)
>>>>>
>>>>>                at
>>>>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
>>>>>
>>>>>                at
>>>>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java
>>>>> :210)
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333
>>>>> )
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251
>>>>> )
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.jav
>>>>> a:555)
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64
>>>>> )
>>>>>
>>>>>                at
>>>>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(In
>>>>> ternalCach
>>>>> edBag.java:237)
>>>>>
>>>>>                at 
>>>>> org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
>>>>>
>>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
>>>>>
>>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres
>>>>> sionOperat
>>>>> ors.POUserFunc.getNext(POUserFunc.java:245)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres
>>>>> sionOperat
>>>>> ors.POUserFunc.getNext(POUserFunc.java:287)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
>>>>> onalOperat
>>>>> ors.POForEach.processPlan(POForEach.java:338)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
>>>>> onalOperat
>>>>> ors.POForEach.getNext(POForEach.java:290)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic
>>>>> alOperator
>>>>> .processInput(PhysicalOperator.java:276)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
>>>>> onalOperat
>>>>> ors.POForEach.getNext(POForEach.java:240)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
>>>>> pReduce$Re
>>>>> duce.runPipeline(PigMapReduce.java:434)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
>>>>> pReduce$Re
>>>>> duce.processOnePackageOutput(PigMapReduce.java:402)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
>>>>> pReduce$Re
>>>>> duce.reduce(PigMapReduce.java:382)
>>>>>
>>>>>                at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
>>>>> pReduce$Re
>>>>> duce.reduce(PigMapReduce.java:251)
>>>>>
>>>>>                at
>>>>> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>>>
>>>>>                at
>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:
>>>>> 572)
>>>>>
>>>>>                at
>>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>>>>>
>>>>>                at 
>>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>>>>
>>>>>                at 
>>>>> java.security.AccessController.doPrivileged(Native
>>>>> Method)
>>>>>
>>>>>                at 
>>>>> javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>
>>>>>                at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfo
>>>>> rmation.ja
>>>>> va:1127)
>>>>>
>>>>>                at 
>>>>> org.apache.hadoop.mapred.Child.main(Child.java:264)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> stderr logs
>>>>>
>>>>> Exception in thread "Low Memory Detector" 
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>>                at
>>>>> sun.management.MemoryNotifInfoCompositeData.getCompositeData(Memor
>>>>> yNotifInfo
>>>>> CompositeData.java:42)
>>>>>
>>>>>                at
>>>>> sun.management.MemoryNotifInfoCompositeData.toCompositeData(Memory
>>>>> NotifInfoC
>>>>> ompositeData.java:36)
>>>>>
>>>>>                at
>>>>> sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)
>>>>>
>>>>>                at
>>>>>
sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.
>>>>> java:300)
>>>>>
>>>>>                at sun.management.Sensor.trigger(Sensor.java:120)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Best Regards,
> Ruslan Al-Fakikh
>

RE: java.lang.OutOfMemoryError when using TOP udf

Reply via email to