Hi all, We're currently trying to design a storage-like system with HBase and need some opinion on capacity planning. It will be read-heavy and probably network-bound. We are thinking about 15-20 nodes for initial setup, but the problem we couldn't tackle is kinda generic.
Reading through the book and various threads in this mailing list, it seems 20GB is the limit on regionsizes and no RS should serve more than a hundred regions. Numbers aside, those limits seem to effect the number of disks a node may contain. A rough calculation shows HBase supports up to 6TB per node with a replication factor of 3. Considering machines with 8-12 disk slots and 2-3TB disks are commodity, that number is well below what's physically achievable. What's average node configuration in the community nowadays? My actual question is somewhat related to http://search-hadoop.com/m/8dg9P13z24H1 : Can those limits change with the nature of the application? Like 'regions should be smaller for write-heavy apps' or 'RS can manage more than two-hundred regions if application is read-heavy'? Reading the Nicolas's answer to that thread, it seems like StoreFile sizes are definitive here. Is it possible to set a different limit on StoreFile sizes than the max region size (using compaction parameters and such), and make, say, 200GB regions with 10 StoreFiles 20GB each? If it is possible, what kind of effect that would have? Thanks in advance! -- erdem agaoglu
