I'd love to hear the kinds of minor pauses you get... left alone to it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your xmx is large enough, at that size you are looking at 800ms minor pauses!
It's a tough subject. -ryan On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ssechr...@gmail.com> wrote: > Interesting. The settings we tried earlier today slowed jobs significantly, > but no failures (yet). We're going to try the 512MB newSize and 60% > CMSInitiatingOccupancyFraction. 1 second pauses here and there would be OK > for us.... we just want to avoid the long pauses right now. We'll also do > what we can to avoid swapping. The ganglia metrics on on there. > > Thanks, > Sean > > On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <t...@cloudera.com> wrote: > >> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ssechr...@gmail.com>wrote: >> >>> Hey guys, >>> >>> I just want to get an idea about how everyone avoids these long GC pauses >>> that cause regionservers to die. >>> >>> What kind of java heap and garbage collection settings do you use? >>> >>> What do you do to make sure that the HBase vm never uses swap? I have >>> heard >>> turning off swap altogether can be dangerous, so right now we have the >>> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia, >>> we >>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that >>> high? >>> >>> To try to avoid using too much memory, is reducing the memstore >>> upper/lower >>> limit, or the block cache size a good idea? Should we just tune down >>> HBase's >>> total heap to try to avoid swap? >>> >>> In terms of our specific problem: >>> >>> We seem to keep running into garbage collection pauses that cause the >>> regionservers to die. We have mix of some random read jobs, as well as a >>> few >>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and >>> we >>> are always inserting data. We would rather sacrifice a little speed for >>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with >>> 12GB >>> max heap given to HBase, and 24GB memory total. >>> >>> We were using the following garbage collection options: >>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m >>> -XX:CMSInitiatingOccupancyFraction=75 >>> >>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are >>> trying to lower NewSize/MaxNewSize to 6m as well as reducing >>> CMSInitiatingOccupancyFraction to 50. >>> >> >> Rather than reducing the new size, you should consider increasing new size >> if you're OK with higher latency but fewer long GC pauses. >> >> GC is a complicated subject, but here are a few rules of thumb: >> >> - A larger young generation means that the young GC pauses, which are >> stop-the-world, will take longer. In my experience it's somewhere around 1 >> second per GB of new size. So, if you're OK with periodic 1 second pauses, a >> large (1GB) new size should be fine. >> - A larger young generation also means that less data will get tenured to >> the old generation. This means that the old generation will have to collect >> less often and also that it will become less fragmented. >> - In HBase, the long (45second+) pauses generally happen when promotion >> fails due to heap fragmentation in the old generation. So, it falls back to >> stop-the-world compacting collection which takes a long time. >> >> So, in general, a large young gen will reduce the frequency of super-long >> pauses, but will increase the frequency of shorter pauses. >> >> It sounds like you may be OK with longer young gen pauses, so maybe >> consider new size at 512M with your 12G total heap? >> >> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS >> to always be running which isn't that efficient. >> >> -Todd >> >> >>> >>> We see messages like this in our GC logs: >>> >>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs] >>> >>> (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340 >>>> secs] >>> >>> >>> >>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion >>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637: >>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark: >>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs] >>> (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340 >>> secs] >>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)], >>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs] >>> >>> There's a lot of questions there, but I definitely appreciate any advice >>> or >>> input anybody else has. Thanks so much! >>> >>> -Sean >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >