On Mon, Apr 2, 2012 at 2:18 PM, Miles Spielberg <[email protected]> wrote: > So it sounds like with our write pattern (highly distributed, all regions > being written to simultaneously), we should be trying to keep number of > regions down to 32 (or whatever hbase.regionserver.maxlogs is set to). I > suppose we could increase the block size, but this would lead to the same > issue of slow replays as would increasing maxlogs. >
Or 16 with MEMSTORE_FLUSHSIZE at 128MB. > Our RegionServers are running with 16 GB heap on 24 GB machines. It sounds > like we can't meaningfully use this heap with our workload since we want to > keep MemStores down to ~2 GB to match HLog capacity. (Our read traffic is > also primarily against recent data, so I guess we won't get much mileage > out of the block cache for StoreFiles, either.) One question you should try to answer is if 2GB is good for you or you're able to tolerate more. With 0.92.0 and distributed log splitting it's not as big of an issue. > > If more regions is just going to get us in this situation again, should we > also be disabling automatic region splitting? We have a 16 node cluster, so > a 256 region pre-split would give us 16 regions/server. As these regions > grow, what is our growth path if more regions/server will lead to us > hitting maxlogs? Do we have options other than adding additional nodes? It's usually recommended to stop splitting once you have a good distribution. If you do split and add regions you'll break the balance yes, maybe you'll want to let HLogs grow more than the size of your MemStores to buffer that up. In any case, HLogs that contain edits from flushed regions will be cleaned up without any other impact so even if you let them grow to 16GB you might never hit it. Keep in mind that a force flushing that happens every now and then is much better and probably indiscernable compared to your current situation where things go crazy all the time :) When you add region servers you can do rolling splits, that's what Facebook does for Messages IIRC. > > In a nutshell, it sounds like our choices are: > > 1) increase HLog capacity (either via HBase BLOCKSIZE or increasing > hbase.regionserver.maxlogs), and pay the price of increased downtime when a > regionserver needs to be restarted. It's not the HBase block size, it's the hadoop block size that you set in hbase-site.xml. Also verify exactly how long it takes to replay 2GB on your system. > 2) restrict our nodes to <30 regions per node, and add nodes/split when > compactions start taking too long. Restricting the number of regions is always good. J-D
