Pradeep, Have you tested this? If so,
(1) Did the problem go away for the queries you tested? (2) What effect did it have on the performance of the queries that run successfully and spill. Thanks, Olga > -----Original Message----- > From: Pradeep Kamath [mailto:[EMAIL PROTECTED] > Sent: Monday, June 09, 2008 2:32 PM > To: [email protected] > Subject: Propsoal for handling "GC overhead limit" errors > > Hi, > > > > Currently in org.apache.pig.impl.util.SpillableMemoryManger: > > > > 1) We use MemoryManagement interface to get notified when the > "collection threshold" exceeds a limit (we set this to > biggest_heap*0.5). With this in place we are still seeing "GC > overhead limit" issues when trying large dataset operations. > Observing some runs, it looks like the notification is not > frequent enough and early enough to prevent memory issues > possibly because this notification only occurs after GC. > > > > 2) We only attempt to free upto : > > long toFree = info.getUsage().getUsed() - > (long)(info.getUsage().getMax()*.5); > > This is only the excess amount over the threshold which > caused the notification and is not sufficient to not be > called again soon. > > > > 3) While iterating over spillables, if current spillable's > memory size is > gcActivationSize, we try to invoke System.gc > > > > 4) We *always* invoke System.gc() after iterating over spillables > > > > Proposed changes are: > > ================= > > 1) In addition to "collection threshold" of biggest_heap*0.5, > a "usage threshold" of biggest_heap*0.7 will be used so we > get notified early and often irrespective of whether garbage > collection has occured. > > > > 2) We will attempt to free > > toFree = info.getUsage().getUsed() - threshold + > (long)(threshold * 0.5); where threshold is > (info.getUsage().getMax() * 0.7) if the > handleNotification() method is handling a "usage threshold exceeded" > notification and (info.getUsage().getMax() * 0.5) otherwise > ("collection threshold exceeded" case) > > > > 3) While iterating over spillables, if the *memory freed thus > far* is > gcActivationSize OR if we have freed sufficient > memory (based on 2) above), then we set a flag to invoke > System.gc when we exit the loop. > > > > 4) We will invoke System.gc() only if the flag is set in 3) above > > > > Please provide thoughts/comments. > > > > Thanks, > > Pradeep > >
