On Mon, Apr 2, 2012 at 1:41 PM, Jean-Daniel Cryans <[email protected]>wrote:
> Decrease *hbase.hregion.memstore.flush.size? > > > > Even if you decrease it enough so that you don't hit the too many hlogs > you'll still end up flushing tiny files which will trigger compactions a > lot too. > > > > Are there other configuration knobs to tweak to control how long > MemStores > > stick around before being flushed? > > > > The issue here is that you should only have the same amount of MemStore > space as you have HLogs. 270 regions (let's say they only have 1 family) > means 270 * 64MB = ~17GB of potential MemStore space but the default > maximum amount of data you can have in the HLogs is 32 * ~64MB = ~2GB (here > 64MB the default block size, if you changed it for your HBase deployment > then swap in the value you're using). > > So it sounds like with our write pattern (highly distributed, all regions being written to simultaneously), we should be trying to keep number of regions down to 32 (or whatever hbase.regionserver.maxlogs is set to). I suppose we could increase the block size, but this would lead to the same issue of slow replays as would increasing maxlogs. > So there's two problems here: > > - You are over-committing the MemStores, I'm pretty sure you don't even > dedicate 17GB to the region servers and the MemStores by default won't > occupy more than 40% of it. > - You HLogs aren't tuned and can't hold all the MemStore data without > force flushing regions. > > Pretty much the only solution here is to have less MemStores so less > regions. Also see: > http://hbase.apache.org/book/important_configurations.html#bigger.regions > > J-D > Our RegionServers are running with 16 GB heap on 24 GB machines. It sounds like we can't meaningfully use this heap with our workload since we want to keep MemStores down to ~2 GB to match HLog capacity. (Our read traffic is also primarily against recent data, so I guess we won't get much mileage out of the block cache for StoreFiles, either.) If more regions is just going to get us in this situation again, should we also be disabling automatic region splitting? We have a 16 node cluster, so a 256 region pre-split would give us 16 regions/server. As these regions grow, what is our growth path if more regions/server will lead to us hitting maxlogs? Do we have options other than adding additional nodes? In a nutshell, it sounds like our choices are: 1) increase HLog capacity (either via HBase BLOCKSIZE or increasing hbase.regionserver.maxlogs), and pay the price of increased downtime when a regionserver needs to be restarted. 2) restrict our nodes to <30 regions per node, and add nodes/split when compactions start taking too long.
