Re: java.lang.OutOfMemoryError when using TOP udf

Jonathan Coveney Mon, 21 Nov 2011 13:54:01 -0800

You're right pablomar...hmm

Ruslan: are you running this in mr mode on a cluster, or locally?


I'm noticing this:
[2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first memory
handler call- Usage threshold init = 175308800(171200K) used =
373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)

It looks like your max memory is 512MB. I've had issues with bag spilling
with less than 1GB allocated (-Xmx1024mb).

2011/11/21 pablomar <[email protected]>

> i might be wrong, but it seems the error comes from
> while(itr.hasNext())
> not from the add to the queue
> so i don't think it is related to the number of elements in the queue
> ... maybe the field lenght?
>
> On 11/21/11, Jonathan Coveney <[email protected]> wrote:
> > Internally, TOP is using a priority queue. It tries to be smart about
> > pulling off excess elements, but if you ask it for enough elements, it
> can
> > blow up, because the priority queue is going to have n elements, where n
> is
> > the ranking you want. This is consistent with the stack trace, which died
> > on updateTop which is when elements are added to the priority queue.
> >
> > Ruslan, how large are the limits you're setting? ie (int)(count *
> (double)
> > ($THIRD_LEVELS_PERCENTAGE + $BOTS_PERCENTAGE) )
> >
> > As far as TOP's implementation, I imagine you could get around the issue
> by
> > using a sorted data bag, but that might be much slower. hmm.
> >
> > 2011/11/21 Ruslan Al-fakikh <[email protected]>
> >
> >> Ok. Here it is:
> >> https://gist.github.com/1383266
> >>
> >> -----Original Message-----
> >> From: Dmitriy Ryaboy [mailto:[email protected]]
> >> Sent: 21 ноября 2011 г. 20:32
> >> To: [email protected]
> >> Subject: Re: java.lang.OutOfMemoryError when using TOP udf
> >>
> >> Ruslan, I think the mailing list is set to reject attachments -- can you
> >> post it as a github gist or something similar, and send a link?
> >>
> >> D
> >>
> >> On Mon, Nov 21, 2011 at 6:11 AM, Ruslan Al-Fakikh
> >> <[email protected]> wrote:
> >> > Hey Dmitriy,
> >> >
> >> > I attached the script. It is not a plain-pig script, because I make
> >> > some preprocessing before submitting it to cluster, but the general
> >> > idea of what I submit is clear.
> >> >
> >> > Thanks in advance!
> >> >
> >> > On Fri, Nov 18, 2011 at 12:07 AM, Dmitriy Ryaboy <[email protected]>
> >> wrote:
> >> >> Ok, so it's something in the rest of the script that's causing this
> >> >> to happen. Ruslan, if you send your script, I can probably figure out
> >> >> why (usually, it's using another, non-agebraic udf in your foreach,
> >> >> or for pig 0.8, generating a constant in the foreach).
> >> >>
> >> >> D
> >> >>
> >> >> On Thu, Nov 17, 2011 at 9:59 AM, pablomar
> >> >> <[email protected]> wrote:
> >> >>> according to the stack trace, the algebraic is not being used it
> >> >>> says
> >> >>> updateTop(Top.java:139)
> >> >>> exec(Top.java:116)
> >> >>>
> >> >>> On 11/17/11, Dmitriy Ryaboy <[email protected]> wrote:
> >> >>>> The top udf does not try to process all data in memory if the
> >> >>>> algebraic optimization can be applied. It does need to keep the
> >> >>>> topn numbers in memory of course. Can you confirm algebraic mode is
> >> used?
> >> >>>>
> >> >>>> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh"
> >> >>>> <[email protected]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Hey guys,
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> I encounter java.lang.OutOfMemoryError when using TOP udf. It
> >> >>>>> seems that the udf tries to process all data in memory.
> >> >>>>>
> >> >>>>> Is there a workaround for TOP? Or maybe there is some other way of
> >> >>>>> getting top results? I cannot use LIMIT since I need to 5% of
> >> >>>>> data, not a constant number of rows.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> I am using:
> >> >>>>>
> >> >>>>> Apache Pig version 0.8.1-cdh3u2 (rexported)
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> The stack trace is:
> >> >>>>>
> >> >>>>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
> >> >>>>> decompressor
> >> >>>>>
> >> >>>>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last
> >> >>>>> merge-pass, with 21 segments left of total size: 2057257173 bytes
> >> >>>>>
> >> >>>>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) -
> >> >>>>> first memory handler call- Usage threshold init =
> >> >>>>> 175308800(171200K) used =
> >> >>>>> 373454552(364701K) committed = 524288000(512000K) max =
> >> >>>>> 524288000(512000K)
> >> >>>>>
> >> >>>>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) -
> >> >>>>> first memory handler call - Collection threshold init =
> >> >>>>> 175308800(171200K) used =
> >> >>>>> 496500704(484863K) committed = 524288000(512000K) max =
> >> >>>>> 524288000(512000K)
> >> >>>>>
> >> >>>>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) -
> >> >>>>> Initializing logs'
> >> >>>>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
> >> >>>>>
> >> >>>>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running
> child :
> >> >>>>> java.lang.OutOfMemoryError: Java heap space
> >> >>>>>
> >> >>>>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
> >> >>>>>
> >> >>>>>                at java.lang.String.<init>(String.java:215)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java
> >> >>>>> :210)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333
> >> >>>>> )
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251
> >> >>>>> )
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.jav
> >> >>>>> a:555)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64
> >> >>>>> )
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(In
> >> >>>>> ternalCach
> >> >>>>> edBag.java:237)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
> >> >>>>>
> >> >>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
> >> >>>>>
> >> >>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres
> >> >>>>> sionOperat
> >> >>>>> ors.POUserFunc.getNext(POUserFunc.java:245)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres
> >> >>>>> sionOperat
> >> >>>>> ors.POUserFunc.getNext(POUserFunc.java:287)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
> >> >>>>> onalOperat
> >> >>>>> ors.POForEach.processPlan(POForEach.java:338)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
> >> >>>>> onalOperat
> >> >>>>> ors.POForEach.getNext(POForEach.java:290)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic
> >> >>>>> alOperator
> >> >>>>> .processInput(PhysicalOperator.java:276)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati
> >> >>>>> onalOperat
> >> >>>>> ors.POForEach.getNext(POForEach.java:240)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
> >> >>>>> pReduce$Re
> >> >>>>> duce.runPipeline(PigMapReduce.java:434)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
> >> >>>>> pReduce$Re
> >> >>>>> duce.processOnePackageOutput(PigMapReduce.java:402)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
> >> >>>>> pReduce$Re
> >> >>>>> duce.reduce(PigMapReduce.java:382)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMa
> >> >>>>> pReduce$Re
> >> >>>>> duce.reduce(PigMapReduce.java:251)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:
> >> >>>>> 572)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> java.security.AccessController.doPrivileged(Native
> >> >>>>> Method)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> javax.security.auth.Subject.doAs(Subject.java:396)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfo
> >> >>>>> rmation.ja
> >> >>>>> va:1127)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> org.apache.hadoop.mapred.Child.main(Child.java:264)
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> stderr logs
> >> >>>>>
> >> >>>>> Exception in thread "Low Memory Detector"
> >> >>>>> java.lang.OutOfMemoryError: Java heap space
> >> >>>>>
> >> >>>>>                at
> >> >>>>> sun.management.MemoryNotifInfoCompositeData.getCompositeData(Memor
> >> >>>>> yNotifInfo
> >> >>>>> CompositeData.java:42)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> sun.management.MemoryNotifInfoCompositeData.toCompositeData(Memory
> >> >>>>> NotifInfoC
> >> >>>>> ompositeData.java:36)
> >> >>>>>
> >> >>>>>                at
> >> >>>>> sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)
> >> >>>>>
> >> >>>>>                at
> >> >>>>>
> >>
> >>
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.
> >> >>>>> java:300)
> >> >>>>>
> >> >>>>>                at sun.management.Sensor.trigger(Sensor.java:120)
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> Thanks in advance!
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards,
> >> > Ruslan Al-Fakikh
> >> >
> >>
> >>
> >
>

Re: java.lang.OutOfMemoryError when using TOP udf

Reply via email to