This is a somewhat fuzzy art.

Some points to consider:
1. All data is replicated three ways. Or in other words, if you run three 
RegionServer/Datanodes each machine will get 100% of the writes. If you run 6, 
each gets 50% of the writes. From that aspect HBase clusters with less than 9 
RegionServers are not really useful.
2. As for the machines themselves. Just go with any reasonable machine, and 
pick the cheapest you can find. At least 8 cores, at least 32GB of RAM, at 
least 6 disks, no RAID needed. (we have machines with 12 cores in 2 sockets, 
96GB of RAM, 6 4TB drives, no HW RAID). HBase is not yet well tuned for SSDs.


You also carefully need to consider your network topology. With HBase you'll 
see quite some east-west traffic (i.e. between racks). 10ge is good if you have 
it. We have 1ge everywhere so far, and we found this is a single most 
bottleneck for write performance.


Also see this blog post about HBase memory sizing (shameless plug): 
http://hadoop-hbase.blogspot.de/2013/01/hbase-region-server-memory-sizing.html


I'm planning a blog post about this topic with more details.


-- Lars



________________________________
 From: Amandeep Khurana <ama...@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Tuesday, July 15, 2014 10:48 PM
Subject: Cluster sizing guidelines
 

Hi

How do users usually go about sizing HBase clusters? What are the factors
you take into account? What are typical hardware profiles you run with? Any
data points you can share would help.

Thanks
Amandeep

Reply via email to