Hi,

I am running HBASE 0.90.3 (just upgraded for testing). It is configured for 1.5G heap, which seemed to be a good setting for HBASE 0.20.6. When running a stress test that would write into three HBASE data nodes from 24 processes with the goal of inserting one billion simple rows, I get an OOMs at two of three region servers after about 75% of the work is done.

Here is the first OOM:

2011-07-09 23:34:40,988 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Applied 924, skipped 1105, firstSequenceidInLog=162957072, maxSequenceidInLog=163841413 2011-07-09 23:34:40,988 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for tir_items,customer/7/8CC6E17710156EE5518325B96E5F5EB9FF3278D2F2E8848E859E90CC7445AE8E,1309973529621.39f9da510435c2bc053fab116af0d4d6., current region memstore size 270.7k; wal is null, using passed sequenceid=163841413 2011-07-09 23:34:40,989 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2011-07-09 23:34:43,266 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2395129059875563550, isReference=false, isBulkLoadResult=false, seqid=150362469, majorCompaction=false 2011-07-09 23:34:51,788 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2547547152617947847, isReference=false, isBulkLoadResult=false, seqid=163671317, majorCompaction=false 2011-07-09 23:34:58,652 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2867700810527601701, isReference=false, isBulkLoadResult=false, seqid=150617582, majorCompaction=false 2011-07-09 23:35:35,067 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readAllIndex(HFile.java:805) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:832) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1002) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:266) at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:208) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2008) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2551) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2537) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:272) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


It then gets more until something fatal happen.

Now:

1. Is there any way to configure some stable heap size? Where is the leak? This is really frustrating (it took a while to figure out 1.5G was "somehow good" for 0.20.6)

2. Wouldn't it make sense to let the region server die at the first OOM and have it restarted quickly rather then letting it go on in some likely broken state after the OOM until it eventually dies anyway?

But, on the good side,  0.90.3 is notably faster at writing than 0.20.6.

Thanks,

*Henning Blohm*

*ZFabrik Software KG*

T:      +49/62278399955
F:      +49/62278399956
M:      +49/1781891820

Bunsenstrasse 1
69190 Walldorf

[email protected] <mailto:[email protected]>
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
www.zfabrik.de <http://www.zfabrik.de>
www.z2-environment.eu <http://www.z2-environment.eu>

Reply via email to