On Mon, Nov 29, 2010 at 6:33 AM, Sean Sechrist <ssechr...@gmail.com> wrote:
> Just an update, in case anyone's interested in our performance numbers: > > With the 512MB newSize, our minor GC pauses are generally less than .05s, > although we see a fair amount get up around .15s. We still see some > promotion failures causing full pauses over a minute occasionally. But we > have a script running to automatically restart our regionservers if that > happens. Things seem to be going ok right now. > > On a related note: If a region server encounters the GC pause of death, > will > all of the writes in its memstore at the time be lost (without using WAL)? > I > think it would be. > Yep, they would be - that's why the WAL is important. One thing I've been thinking about is a way to have an HBase-orchestrated constant rolling System.gc(). If we can detect heap fragmentation before it causes a long pause, we can shed regions gracefully, do system.gc(), and then pick them up again. A little tricky but should solve these issues once and forall, especially on big clusters where constant rolling restart isn't a big deal compared to total capacity. -Todd > On Mon, Nov 29, 2010 at 4:49 AM, Friso van Vollenhoven < > fvanvollenho...@xebia.com> wrote: > > > On a slightly related note, we've been running with G1 with default > > settings on a 16GB heap for some weeks now. It's never given us trouble, > so > > I didn't do any real analysis on the GC times, just some eye balling. > > > > I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i > > real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s. > The > > minor pauses all appear to be around 0.2s. I can pastebin a GC log if > anyone > > is interested in the G1 behavior. > > > > > > > > Friso > > > > > > > > On 29 nov 2010, at 09:47, Ryan Rawson wrote: > > > > > I'd love to hear the kinds of minor pauses you get... left alone to > > > it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your > > > xmx is large enough, at that size you are looking at 800ms minor > > > pauses! > > > > > > It's a tough subject. > > > > > > -ryan > > > > > > On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ssechr...@gmail.com> > > wrote: > > >> Interesting. The settings we tried earlier today slowed jobs > > significantly, > > >> but no failures (yet). We're going to try the 512MB newSize and 60% > > >> CMSInitiatingOccupancyFraction. 1 second pauses here and there would > be > > OK > > >> for us.... we just want to avoid the long pauses right now. We'll also > > do > > >> what we can to avoid swapping. The ganglia metrics on on there. > > >> > > >> Thanks, > > >> Sean > > >> > > >> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <t...@cloudera.com> > wrote: > > >> > > >>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ssechr...@gmail.com > > >wrote: > > >>> > > >>>> Hey guys, > > >>>> > > >>>> I just want to get an idea about how everyone avoids these long GC > > pauses > > >>>> that cause regionservers to die. > > >>>> > > >>>> What kind of java heap and garbage collection settings do you use? > > >>>> > > >>>> What do you do to make sure that the HBase vm never uses swap? I > have > > >>>> heard > > >>>> turning off swap altogether can be dangerous, so right now we have > the > > >>>> setting vm.swappiness=0. How do you tell if it's using swap? On > > Ganglia, > > >>>> we > > >>>> see the "CPU wio" metric at around 4.5% before one of our crashes. > Is > > that > > >>>> high? > > >>>> > > >>>> To try to avoid using too much memory, is reducing the memstore > > >>>> upper/lower > > >>>> limit, or the block cache size a good idea? Should we just tune down > > >>>> HBase's > > >>>> total heap to try to avoid swap? > > >>>> > > >>>> In terms of our specific problem: > > >>>> > > >>>> We seem to keep running into garbage collection pauses that cause > the > > >>>> regionservers to die. We have mix of some random read jobs, as well > as > > a > > >>>> few > > >>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), > > and > > >>>> we > > >>>> are always inserting data. We would rather sacrifice a little speed > > for > > >>>> stability, if that means anything. We have 7 nodes (RS + DN + TT) > with > > >>>> 12GB > > >>>> max heap given to HBase, and 24GB memory total. > > >>>> > > >>>> We were using the following garbage collection options: > > >>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m > > >>>> -XX:CMSInitiatingOccupancyFraction=75 > > >>>> > > >>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, > we > > are > > >>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing > > >>>> CMSInitiatingOccupancyFraction to 50. > > >>>> > > >>> > > >>> Rather than reducing the new size, you should consider increasing new > > size > > >>> if you're OK with higher latency but fewer long GC pauses. > > >>> > > >>> GC is a complicated subject, but here are a few rules of thumb: > > >>> > > >>> - A larger young generation means that the young GC pauses, which are > > >>> stop-the-world, will take longer. In my experience it's somewhere > > around 1 > > >>> second per GB of new size. So, if you're OK with periodic 1 second > > pauses, a > > >>> large (1GB) new size should be fine. > > >>> - A larger young generation also means that less data will get > tenured > > to > > >>> the old generation. This means that the old generation will have to > > collect > > >>> less often and also that it will become less fragmented. > > >>> - In HBase, the long (45second+) pauses generally happen when > promotion > > >>> fails due to heap fragmentation in the old generation. So, it falls > > back to > > >>> stop-the-world compacting collection which takes a long time. > > >>> > > >>> So, in general, a large young gen will reduce the frequency of > > super-long > > >>> pauses, but will increase the frequency of shorter pauses. > > >>> > > >>> It sounds like you may be OK with longer young gen pauses, so maybe > > >>> consider new size at 512M with your 12G total heap? > > >>> > > >>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will > cause > > CMS > > >>> to always be running which isn't that efficient. > > >>> > > >>> -Todd > > >>> > > >>> > > >>>> > > >>>> We see messages like this in our GC logs: > > >>>> > > >>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs] > > >>>> > > >>>> (concurrent mode failure): 10126729K->5760080K(13246464K), > > 91.2530340 > > >>>>> secs] > > >>>> > > >>>> > > >>>> > > >>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew > > (promotion > > >>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637: > > >>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark: > > >>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs] > > >>>> (concurrent mode failure): 10126729K->5760080K(13246464K), > 91.2530340 > > >>>> secs] > > >>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)], > > >>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs] > > >>>> > > >>>> There's a lot of questions there, but I definitely appreciate any > > advice > > >>>> or > > >>>> input anybody else has. Thanks so much! > > >>>> > > >>>> -Sean > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Todd Lipcon > > >>> Software Engineer, Cloudera > > >>> > > >> > > > > > -- Todd Lipcon Software Engineer, Cloudera