+1
> -----Original Message----- > From: Pradeep Kamath [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 12:18 PM > To: [email protected]; [EMAIL PROTECTED] > Subject: RE: Propsoal for handling "GC overhead limit" errors > > The GC overhead limit error could occur even when we are not > low on memory but if memory is fragmented and if the GC > spends too much time freeing little memory. Also, we don't > want to slow down performance by invoking it too often. > Keeping these two in mind, I propose that the > GCActiviationSize be applied to the memory freed thus far > rather than applying it to the current Spillable's memory > size and to set a flag on when this size is reached and > invoke GC only once per handler invocation. > > Also I would like to use the following defaults if it is reasonable: > // if we freed at least this much, invoke GC > // (default 40 MB - this can be overridden by user supplied > property) > private static long gcActivationSize = 40000000L ; > > // spill file size should be at least this much > // (default 5MB - this can be overridden by user supplied > property) > private static long spillFileSizeThreshold = 5000000L ; > > // fraction of biggest heap for which we want to get > // "memory usage threshold exceeded" notifications > private static double memoryThresholdFraction = 0.7; > > // fraction of biggest heap for which we want to get > // "collection threshold exceeded" notifications > private static double collectionMemoryThresholdFraction = 0.5; > > > I am currently running more tests to check if previously seen > issues with queries are now solved with these changes. > > -Pradeep > > -----Original Message----- > From: pi song [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 7:15 AM > To: [email protected] > Subject: Re: Propsoal for handling "GC overhead limit" errors > > Sorry. It's actually Long.MAX_VALUE, not Integer. > > On Thu, Jun 12, 2008 at 12:12 AM, pi song <[EMAIL PROTECTED]> wrote: > > > Pradeep, > > > > I totally buy your biggest_heap*0.7 idea. > > > > BUT!!, I've tried this:- > > > > for(int i=0;i<100000;i++) { > > StringBuilder sb = new StringBuilder() ; > > for(int j=0;j<100;j++) { > > sb.append("hodgdfdsfsddf") ; > > } > > System.gc(); > > } > > And it doesn't give me any error. So I think calling too > often is not > a > > problem except it might be slow. > > > > GCActiviationSize by default is set to Integer.MAX_VALUE. I believe > most > > people have never used. So, it should have nothing to do with the > current > > problem. > > > > My concern about using soft/weak reference for data in bag > is that if > the > > granularity is too fine, we will need more space for those > additional > > pointers. > > > > Pi > > > > > > On Wed, Jun 11, 2008 at 5:51 AM, Mridul Muralidharan < > > [EMAIL PROTECTED]> wrote: > > > >> > >> > >> Ideally, instead of using SpillableMemoryManager, it might > be better > to - > >> > >> a) use soft/weak reference to refer to the data in a bag/tuple. > >> a.1) soft reference since it is less gc sensitive as > compared to weak > >> reference (a gc kicks all weak ref's out typically). So soft ref's > are sort > >> of like a cache which are not so frequently kicked. > >> b) register them with reference queue and manage the life cycle of > >> referrent (to spill/not spill). > >> ) override get/put in bag/tuple such that we load off the disk if > the > >> referrent is null (this should already be done in some way in the > code > >> currently). > >> > >> > >> Ofcourse, this is much more work and is slightly more tricky ... so > if > >> SpillablyMemoryManager can handle the requirements, it should work > fine. > >> > >> > >> Regards, > >> Mridul > >> > >> > >> > >> Pradeep Kamath wrote: > >> > >>> Hi, > >>> > >>> > >>> Currently in org.apache.pig.impl.util.SpillableMemoryManger: > >>> > >>> > >>> 1) We use MemoryManagement interface to get notified when the > >>> "collection threshold" exceeds a limit (we set this to > >>> biggest_heap*0.5). With this in place we are still seeing "GC > overhead > >>> limit" issues when trying large dataset operations. Observing some > runs, > >>> it looks like the notification is not frequent enough and early > enough > >>> to prevent memory issues possibly because this notification only > occurs > >>> after GC. > >>> > >>> > >>> 2) We only attempt to free upto : > >>> > >>> long toFree = info.getUsage().getUsed() - > >>> (long)(info.getUsage().getMax()*.5); > >>> > >>> This is only the excess amount over the threshold which > caused the > >>> notification and is not sufficient to not be called again soon. > >>> > >>> > >>> 3) While iterating over spillables, if current spillable's memory > size > >>> is > gcActivationSize, we try to invoke System.gc > >>> > >>> > >>> 4) We *always* invoke System.gc() after iterating over spillables > >>> > >>> > >>> Proposed changes are: > >>> > >>> ================= > >>> > >>> 1) In addition to "collection threshold" of biggest_heap*0.5, a > "usage > >>> threshold" of biggest_heap*0.7 will be used so we get > notified early > and > >>> often irrespective of whether garbage collection has occured. > >>> > >>> > >>> 2) We will attempt to free > >>> toFree = info.getUsage().getUsed() - threshold + > (long)(threshold * > >>> 0.5); where threshold is (info.getUsage().getMax() * 0.7) if the > >>> handleNotification() method is handling a "usage > threshold exceeded" > >>> notification and (info.getUsage().getMax() * 0.5) otherwise > ("collection > >>> threshold exceeded" case) > >>> > >>> > >>> 3) While iterating over spillables, if the *memory freed thus far* > is > > >>> gcActivationSize OR if we have freed sufficient memory > (based on 2) > >>> above), then we set a flag to invoke System.gc when we exit the > loop. > >>> > >>> 4) We will invoke System.gc() only if the flag is set in 3) above > >>> > >>> > >>> Please provide thoughts/comments. > >>> > >>> > >>> Thanks, > >>> > >>> Pradeep > >>> > >>> > >>> > >> > > >
