Setting swap to zero is going to an extreme but changing it so its not
the default -- 60% IIRC -- could help.

I made HBASE-3376 for the NPE.  I'll take a look into it.

Congrats on the 113second G1 pause.  I thought G1 was to be the end of
such pauses? (Smile).   Its good to hear that its staying up for you.
What do your JVM options look like for G1?  Are you running with
-XX:MaxGCPauseMillis?

Good on you Friso,
St.Ack

On Sat, Dec 18, 2010 at 12:12 AM, Friso van Vollenhoven
<[email protected]> wrote:
> Hi J-D,
>
> I redid the job as before to once more verify. The real problem appears to be 
> something I had not noticed, because I never expected it. The machines 
> started swapping during the job. I did not expect that, because there is a 
> total of about 45GB heap allocated on a 64GB machine and nothing much else 
> running, so I had not thought of that immediately (although I should have 
> checked nonetheless). Network and IO graphs look normal. We run on 10 disks / 
> datanode, so there are some spindles available. I will try to get the 
> swapping out of the way and then try again and see if it fixes it.
>
> Now the GC log shows some obvious evil doers:
> 2010-12-18T00:17:42.274+0000: 14275.374: [Full GC (System.gc()) 
> 6913M->4722M(15742M), 113.1350880 secs]
>  [Times: user=6.99 sys=1.80, real=113.12 secs]
>
> A (partial) log of one of the RS is here:
> GC log: http://pastebin.com/GPRVj8u5
> RS log: http://pastebin.com/Vj5K26ss
>
> The GC is G1, so it may look different from what you expect from a GC log. I 
> know it is considered experimental, but I like the concept of it and think 
> it's nice to gain some experience with it.
>
> Also, in the RS log I see some NPEs:
> 2010-12-18 04:31:24,043 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
> Caught throwable while processing event M_RS_OPEN_REGION
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:208)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:89)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
>
> These I had not seen before. Any idea?
>
>
> Friso
>
>
>
> On 17 dec 2010, at 20:11, Jean-Daniel Cryans wrote:
>
> If it wasn't GC it would be core dumps and whatnot, nothings free :)
>
> I will reiterate what I said in my first answer, I'd like to see your
> GC log since at this point I haven't seen direct evidence of GC
> activity.
>
> J-D
>
> On Fri, Dec 17, 2010 at 1:27 AM, Friso van Vollenhoven
> <[email protected]<mailto:[email protected]>> wrote:
> Hi J-D,
>
> Thanks for your comments and clarification. I guess GC does blow (especially 
> when writing things like databases and filesystems).
>
> Right now I will dive into GC tuning once more and probably lower the number 
> of reducers on the insert jobs.
>
>
> Thanks,
> Friso
>
>

Reply via email to