I have some test numbers below in the mail, but first the discussion items:
Going by http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.Out-of-M emory%20Exceptions%7Coutline, I think the "GC overhead limit" exception is thrown when the GC spends 98% of its time freeing less than 2% memory. The "java heap space" error is a more direct error that we are out of space. So the GC should be invoked judiciously so as not to hit the "overhead limit". In http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#gc(), it says System.gc() "..suggests that the JVM expend effort.." - does this mean that the GC may not actually run? If we keep the GCActivationSize to apply to the current spillable's memory size, we could potentially (if the GC is actually called!) prevent the smaller nested bags in a big bag from being spilled. However we could again invoke the GC after further iteration if we freed enough memory - this double call in quick succession within the same handler invocation could potentially trigger an "overhead limit" exception. Hence I would like to keep the GCActiviationSize to apply to the memory freed thus far rather than the applying it to the current Spillable's memory size and set a flag on when this size is reached and invoke GC only once per handler invocation. This is a tradeoff - either we prevent redundant spills of smaller nested bags OR we prevent double calls of System.gc() in same handler invocation. Re: Alan's concern: Given the description above about GC overhead limit, I am concerned that if we invoke GC (without an activation limit) we might get the GC into a mode where it spends 98% its time freeing < 2% memory and hence cause an exception. 1) We could keep track of spill sizes between GC invocations and reduce the dribble by invoking GC when cumulative spill sizes crosses activation limit. 2) We could keep the GCActivationSize close to the spillFileSizeThreshold and hence cause the GC to be invoked more often (again risking the "overhead limit" if it is too close) - for example, spillFileSizeThreshold = 5 MB and GCActivationSize = 40MB (4% of 1 GB) - so in the worst case, we would invoke GC in 8 small spills of 5MB if we do 1) above. Thoughts? Here are test results with run times with the new changes (only the changes initially proposed, not the ones being discussed here): Script run on 9 nodes, -Xmx512m (max heap size) with data contains 200 million rows: a = load '/user/pig/tests/data/singlefile/studenttab200m'; b = group a all; c = foreach b generate COUNT(a.$0); store c into '/tmp/pig/bigdata_out'; new code: 1 hr, 21 mins, 1 sec old code: 8 hrs, 26mins, 28 secs [3 reduce attemtpts - 1st attempt had GC overhead limit exceeded error., 2nd attempt had hadoop issues ("Lost task tracker"), 3rd attempt succeeded ] Script run on 9 nodes, -Xmx512m (max heap size) with data contains 200 million rows: a = load '/user/pig/tests/data/singlefile/studenttab200m'; b = group a by $0; c = foreach b generate COUNT(a.$0), group; store c into '/tmp/pig/bigdata_complex_out'; new code: 1hrs, 9mins, 53sec old code: 1hrs, 12mins, 25sec Script run on 1 node, -xmx512m (max heap size) with data containing 20 million rows: a = load '/user/pradeepk/studenttab20m'; b = group a all; c = foreach b generate COUNT(a.$0); store c into '/tmp/pig/meddata_out'; New code: 28mins, 19sec old code:Failed with 3 attempts in reduce all with java heap space errors Script run on 9 nodes, -Xmx512m (max heap size) with data contains 20 million rows: a = load '/user/pig/tests/data/singlefile/studenttab20m'; b = group a all; c = foreach b generate COUNT(a.$0); store c into '/tmp/pig/meddata_out'; new code: 6mins, 37sec old code: 23mins, 22sec - old code sometimes gives gc allocation overhead errors Pradeep -----Original Message----- From: pi song [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2008 7:54 AM To: [email protected] Subject: Re: Propsoal for handling "GC overhead limit" errors GC Overhead limit means OutOfMemory, right? Then we should think about ideas to save memory. I've heard about memory compression technique before but it is only good when we access the data sequentially and of course this incurs some performance impact. My 2 cents. On Wed, Jun 11, 2008 at 12:41 AM, pi song <[EMAIL PROTECTED]> wrote: > Pradeep's (3) is what I thought before but I ended up implementing the way > it is because I believed disk I/O should be slow anyway. Adding just a gc > call shouldn't cause much trouble (we are not calling too often anyway). (4) > will be called only once per each notification so again should not be > considered too expensive. > > That (3) bit also serves another purpose to help reduce small spills:- > (This is what I posted before) > "Based on the fact that now we spill big bags first, my observation is that > there are still cases where a big container bag is spilled and therefore its > mContent becomes empty but most of its inner bags' WeakReferences aren't > clean-up by GC yet. In such cases, if we haven't freed up enough memory, > those inner bags will be unnecessarily spilled (however all their contents > were already spilled in the big bag spill)" > > Pi > > > On Tue, Jun 10, 2008 at 11:06 AM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> My concern with the methodology is that we can get into a dribble mode. >> Consider the following scenario: >> >> 1) We get a usage threshold exceeded notification. >> 2) We spill, but not enough to activate the garbage collector. >> 3) Next time the jvm checks, will we still get a usage exceeded threshold? >> I assume, since the gc won't have run. But at this point it's highly >> unlikely that we'll spill enough to activate the gc. From here on out we're >> stuck, spilling little bits but not calling the gc until the system invokes >> it. >> >> We could mitigate this some by tracking spill sizes across spills and >> invoking the gc when we reach the threshold. This does not avoid the >> dribble, it does shorten it. >> >> I think any time we spill we should invoke the gc to avoid the dribble. >> Pradeep is concerned that this will cause us to invoke the gc too often, >> which is a possible cause of the error we see. Is it possible to estimate >> our spill size before we start spilling and decide up front whether to try >> it or not? >> Alan. >> >> >> Pradeep Kamath wrote: >> >>> Hi, >>> >>> >>> Currently in org.apache.pig.impl.util.SpillableMemoryManger: >>> >>> >>> 1) We use MemoryManagement interface to get notified when the >>> "collection threshold" exceeds a limit (we set this to >>> biggest_heap*0.5). With this in place we are still seeing "GC overhead >>> limit" issues when trying large dataset operations. Observing some runs, >>> it looks like the notification is not frequent enough and early enough >>> to prevent memory issues possibly because this notification only occurs >>> after GC. >>> >>> >>> 2) We only attempt to free upto : >>> >>> long toFree = info.getUsage().getUsed() - >>> (long)(info.getUsage().getMax()*.5); >>> >>> This is only the excess amount over the threshold which caused the >>> notification and is not sufficient to not be called again soon. >>> >>> >>> 3) While iterating over spillables, if current spillable's memory size >>> is > gcActivationSize, we try to invoke System.gc >>> >>> >>> 4) We *always* invoke System.gc() after iterating over spillables >>> >>> >>> Proposed changes are: >>> >>> ================= >>> >>> 1) In addition to "collection threshold" of biggest_heap*0.5, a "usage >>> threshold" of biggest_heap*0.7 will be used so we get notified early and >>> often irrespective of whether garbage collection has occured. >>> >>> >>> 2) We will attempt to free >>> toFree = info.getUsage().getUsed() - threshold + (long)(threshold * >>> 0.5); where threshold is (info.getUsage().getMax() * 0.7) if the >>> handleNotification() method is handling a "usage threshold exceeded" >>> notification and (info.getUsage().getMax() * 0.5) otherwise ("collection >>> threshold exceeded" case) >>> >>> >>> 3) While iterating over spillables, if the *memory freed thus far* is > >>> gcActivationSize OR if we have freed sufficient memory (based on 2) >>> above), then we set a flag to invoke System.gc when we exit the loop. >>> >>> 4) We will invoke System.gc() only if the flag is set in 3) above >>> >>> >>> Please provide thoughts/comments. >>> >>> >>> Thanks, >>> >>> Pradeep >>> >>> >>> >>> >> >> >
