In the last few days, I have just tuned a test cluster with only 3 servers(16cores/24GB memory/SATA). Random write rate improved from few MB/s to 90MB/s.
The key factors for me are: 1. snappy compression 2. random rowkey with a prefix hash value(left half of md5, 8Bytes) 3. pre-split with org.apache.hadoop.hbase.util.RegionSplitter.HexStringSplit 4. no auto-split with hbase.hregion.max.filesize => 100GB 5. tuning hbase.regionserver.handler.count/hbase..thrift.minWorkerThreads according to your clients' request 6. thrift is also a CPU consuming guy, you should consider having several instances of it if your client is non-java. 7. disable writeToWAL if you feel no pain 8. other tips: http://gbif.blogspot.dk/2012/07/optimizing-writes-in-hbase.html FYI. On Fri, Jul 13, 2012 at 6:28 PM, xkwang bruce <[email protected]> wrote: > hi all, > > > I plan to use hbase to support a online application that need > high concurrency random read . There id hundreds of gb data every day and > that will be add hbase. > Any links, articals, suggestions, ideas that will improve hbase random > read/write performance will be appreciated. > > > bruce > best regards!!! -- Davey Yan
