Everyone will tell you that handling less regions is always better. Depending on your setup, data-size and number of records, I would say that 1 to 5 regions per table and server is acceptable. In some setup (one big table for example) you can see up to 100/200 regions per server, which is the kind of maximum number you should keep in mind (Reference Guide is talking about "a few hundreds" as far I remember).
On Fri, Jul 13, 2012 at 11:14 PM, Rob Roland <[email protected]> wrote: > In almost every table, the rowkey is either a SHA hash, or a SHA hash and a > timestamp, so we have a fairly even distribution of rowkeys now. > > Is there a best practice for number of regions of a table per server? > Meaning, with 5 region servers, 10 regions per table, so 170 regions per > region server, would that be good? > > Thanks for the feedback, > > Rob > > On Fri, Jul 13, 2012 at 1:58 PM, Adrien Mogenet <[email protected] > >wrote: > > > It can be reasonable to turn off the automatic region split if you know > > your rowkey distribution well and you're able to ensure a great > parallelism > > among your regionservers "easily". (ie: manually or through HBase API). > > Sometimes it's even the best solution to ensure the minimum number of > > regions (Many companies are doing this). There is an example about > > pre-splitting regions on the Reference Guide. > > > > About your region size, consider upgrading it to 2 GB or even more will > > help to reduce the number of regions and storeFiles. > > > > On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[email protected]> > > wrote: > > > > > Hi all, > > > > > > The HBase instance I'm managing has grown to the point that it has way > > too > > > many regions per server - 5 region servers with 1010 regions each on > > HBase > > > 0.90.4-cdh3u2. I want to bring this region count under control. The > > > cluster is currently running with the default region size of 256 mb, > and > > > the data is spread across 17 tables. I've turned on compression for > all > > > the column families, which is great, as my region count is growing much > > > slower now. I've looked through HDFS at the individual regions, and > they > > > seem rather small - 40-50 mb - which is not surprising due to major > > > compactions after enabling compression. My total hbase folder size in > > HDFS > > > (hadoop fs -dus /hbase) is 926,939,501,499 bytes. > > > > > > My question is - what's the best strategy for handling this? > > > > > > What I assume from reading the docs: > > > > > > 1. Increase the hbase.hregion.max.filesize to something more > reasonable, > > > like 2 GB. > > > 2. Bring the cluster offline and merge regions. > > > > > > Is there a good way to determine the actual region sizes, other than > > > manually, that way I can do the merges to end up with the most > efficient > > > regions, size-wise? > > > > > > At what point is it a good idea to turn off automatic region splits and > > > manually manage them? > > > > > > Thanks, > > > > > > Rob Roland > > > Senior Software Engineer > > > Simply Measured, Inc. > > > > > > > > > > > -- > > Adrien Mogenet > > 06.59.16.64.22 > > http://www.mogenet.me > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
