That was a great thread Todd... it almost got somewhere. Looks like you were owed a response by the hotspot-gc crew.
Friso, I wonder if u23 is better? There are a bunch of G1 fixes in it: http://www.oracle.com/technetwork/java/javase/2col/6u23bugfixes-191074.html St.Ack On Sat, Dec 18, 2010 at 10:39 PM, Todd Lipcon <[email protected]> wrote: > I also had long GC pauses on G1 when I tried it last summer. Check out > this thread for the gory details: > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-July/000653.html > > -Todd > > On Sat, Dec 18, 2010 at 10:37 PM, Stack <[email protected]> wrote: >> Setting swap to zero is going to an extreme but changing it so its not >> the default -- 60% IIRC -- could help. >> >> I made HBASE-3376 for the NPE. I'll take a look into it. >> >> Congrats on the 113second G1 pause. I thought G1 was to be the end of >> such pauses? (Smile). Its good to hear that its staying up for you. >> What do your JVM options look like for G1? Are you running with >> -XX:MaxGCPauseMillis? >> >> Good on you Friso, >> St.Ack >> >> On Sat, Dec 18, 2010 at 12:12 AM, Friso van Vollenhoven >> <[email protected]> wrote: >>> Hi J-D, >>> >>> I redid the job as before to once more verify. The real problem appears to >>> be something I had not noticed, because I never expected it. The machines >>> started swapping during the job. I did not expect that, because there is a >>> total of about 45GB heap allocated on a 64GB machine and nothing much else >>> running, so I had not thought of that immediately (although I should have >>> checked nonetheless). Network and IO graphs look normal. We run on 10 disks >>> / datanode, so there are some spindles available. I will try to get the >>> swapping out of the way and then try again and see if it fixes it. >>> >>> Now the GC log shows some obvious evil doers: >>> 2010-12-18T00:17:42.274+0000: 14275.374: [Full GC (System.gc()) >>> 6913M->4722M(15742M), 113.1350880 secs] >>> [Times: user=6.99 sys=1.80, real=113.12 secs] >>> >>> A (partial) log of one of the RS is here: >>> GC log: http://pastebin.com/GPRVj8u5 >>> RS log: http://pastebin.com/Vj5K26ss >>> >>> The GC is G1, so it may look different from what you expect from a GC log. >>> I know it is considered experimental, but I like the concept of it and >>> think it's nice to gain some experience with it. >>> >>> Also, in the RS log I see some NPEs: >>> 2010-12-18 04:31:24,043 ERROR >>> org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while >>> processing event M_RS_OPEN_REGION >>> java.lang.NullPointerException >>> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) >>> at >>> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545) >>> at >>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:208) >>> at >>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:89) >>> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:619) >>> >>> These I had not seen before. Any idea? >>> >>> >>> Friso >>> >>> >>> >>> On 17 dec 2010, at 20:11, Jean-Daniel Cryans wrote: >>> >>> If it wasn't GC it would be core dumps and whatnot, nothings free :) >>> >>> I will reiterate what I said in my first answer, I'd like to see your >>> GC log since at this point I haven't seen direct evidence of GC >>> activity. >>> >>> J-D >>> >>> On Fri, Dec 17, 2010 at 1:27 AM, Friso van Vollenhoven >>> <[email protected]<mailto:[email protected]>> wrote: >>> Hi J-D, >>> >>> Thanks for your comments and clarification. I guess GC does blow >>> (especially when writing things like databases and filesystems). >>> >>> Right now I will dive into GC tuning once more and probably lower the >>> number of reducers on the insert jobs. >>> >>> >>> Thanks, >>> Friso >>> >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
