On Mon, Jun 6, 2011 at 7:57 AM, Mat Hofschen <[email protected]> wrote: > Hello, > > the hbase book (http://hbase.apache.org/book.html) suggests to increase > hbase.hregion.max.filesize to a large value. (> 1G)
Yes. If you want to cut down on the number of regions in a table, this is a good recommendation to follow. > Then there are many > suggestions on mailing list to keep the dfs.block.size set at 64M. What is > the relationship between the two values? How does hbase prevent lots of > network traffic if there are up to 18 dfs blocks per Region. > Going by the above, it doesn't look like you need me to tell you what the relationship is; you know it already. HBase does nothing to cut down on the network traffic (Wouldn't the traffic effectively be the same whatever the block size?). If bigger block sizes, there'd be less socket setups which would probably be good and there'd be less accounting for the NN to do. Most clusters run the default 64MB block sizes (though where I work, we are 128MBers). (Andrew, did you report a performance penalty running with bigger blocks?) > Right now we operate a cluster with 50 nodes, dfs.block.size=64M and > hbase.hregion.max.filesize=134M. We have one large table that has over 50000 > regions. So you are running about 1k regions per server? It'd probably be better to up your region size, yes (Less regions for the cluster to keep account of is usually a 'good' thing). > Thats seems way to many. So according to the hbase book we ought to > be able to increase the hbase.hregion.max.filesize=1G and benefit from much > fewer splits. Our cluster is very heavy on writing, that means we currently > are splitting all the time. > Yeah. Less splits is usually also a 'good' thing. If write heavy and good distribution of writes, then yes, you should go to bigger regions. St.Ack
