Hi J-D,

I redid the job as before to once more verify. The real problem appears to be 
something I had not noticed, because I never expected it. The machines started 
swapping during the job. I did not expect that, because there is a total of 
about 45GB heap allocated on a 64GB machine and nothing much else running, so I 
had not thought of that immediately (although I should have checked 
nonetheless). Network and IO graphs look normal. We run on 10 disks / datanode, 
so there are some spindles available. I will try to get the swapping out of the 
way and then try again and see if it fixes it.

Now the GC log shows some obvious evil doers:
2010-12-18T00:17:42.274+0000: 14275.374: [Full GC (System.gc()) 
6913M->4722M(15742M), 113.1350880 secs]
 [Times: user=6.99 sys=1.80, real=113.12 secs]

A (partial) log of one of the RS is here:
GC log: http://pastebin.com/GPRVj8u5
RS log: http://pastebin.com/Vj5K26ss

The GC is G1, so it may look different from what you expect from a GC log. I 
know it is considered experimental, but I like the concept of it and think it's 
nice to gain some experience with it.

Also, in the RS log I see some NPEs:
2010-12-18 04:31:24,043 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_RS_OPEN_REGION
java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:208)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:89)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

These I had not seen before. Any idea?


Friso



On 17 dec 2010, at 20:11, Jean-Daniel Cryans wrote:

If it wasn't GC it would be core dumps and whatnot, nothings free :)

I will reiterate what I said in my first answer, I'd like to see your
GC log since at this point I haven't seen direct evidence of GC
activity.

J-D

On Fri, Dec 17, 2010 at 1:27 AM, Friso van Vollenhoven
<[email protected]<mailto:[email protected]>> wrote:
Hi J-D,

Thanks for your comments and clarification. I guess GC does blow (especially 
when writing things like databases and filesystems).

Right now I will dive into GC tuning once more and probably lower the number of 
reducers on the insert jobs.


Thanks,
Friso

Reply via email to