I'm trying to increase write throughput of our hbase cluster. we'r currently doing around 7500 messages per sec per node. I think we have room for improvement. Especially since the heap is under utilized and memstore size doesn't seem to fluctuate much between regular and peak ingestion loads.
We mainly have one large table that we write most of the data to. Other tables are mainly opentsdb and some relatively small summary tables. This table is read in batch once a day but otherwise is mostly serving writes 99% of the time. This large table has 1 CF and get's flushed at around ~128M fairly regularly like below.. {log} 2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640 for region msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0002e5a329d2171149bcc1e83ed129312b\x00\x00\x00\x00,1413909604591.828e03c0475b699278256d4b5b9638a2. in 640ms, sequenceid=16861176169, compaction requested=true {log} Here's a pastebin of my hbase site : http://pastebin.com/fEctQ3im What i'v tried.. - turned of major compactions , and handling these manually. - bumped up heap Xmx from 24G to 48 G - hbase.hregion.memstore.flush.size = 512M - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) since the global heap has enough space to accommodate the default percentages. - Currently running Hbase 98.1 on an 8 node cluster that's scaled up to 128GB RAM. There hasn't been any appreciable increase in write perf. Still hovering around the 7500 per node write throughput number. The flushes still seem to be hapenning at 128M (instead of the expected 512) I'v attached a snapshot of the memstore size vs. flushQueueLen. the block caches are utilizing the extra heap space but not the memstore. The flush Queue lengths have increased which leads me to believe that it's flushing way too often without any increase in throughput. Please let me know where i should dig further. That's a long email, thanks for reading through :-) Cheers, -Gautam.