Region sizes and number of regions per RS

Erdem Agaoglu Tue, 17 Jul 2012 09:47:06 -0700

Hi all,

We're currently trying to design a storage-like system with HBase and need
some opinion on capacity planning. It will be read-heavy and probably
network-bound. We are thinking about 15-20 nodes for initial setup, but the
problem we couldn't tackle is kinda generic.


Reading through the book and various threads in this mailing list, it seems
20GB is the limit on regionsizes and no RS should serve more than a hundred
regions. Numbers aside, those limits seem to effect the number of disks a
node may contain. A rough calculation shows HBase supports up to 6TB per
node with a replication factor of 3. Considering machines with 8-12 disk
slots and 2-3TB disks are commodity, that number is well below what's
physically achievable. What's average node configuration in the community
nowadays?

My actual question is somewhat related to
http://search-hadoop.com/m/8dg9P13z24H1 : Can those limits change with the
nature of the application? Like 'regions should be smaller for write-heavy
apps' or 'RS can manage more than two-hundred regions if application is
read-heavy'?

Reading the Nicolas's answer to that thread, it seems like StoreFile sizes
are definitive here. Is it possible to set a different limit on StoreFile
sizes than the max region size (using compaction parameters and such), and
make, say, 200GB regions with 10 StoreFiles 20GB each? If it is possible,
what kind of effect that would have?

Thanks in advance!

-- 
erdem agaoglu

Region sizes and number of regions per RS

Reply via email to