We found that 2 cores is not enough to run hbase. 1 core can easily get tied up with a compaction while the other is doing garbage collection. That doesn't leave any headroom for gets/scans, especially on compressed data and/or when multiple are happening at the same time. Try to do all of that at the same time and some of the other background tasks start choking, like memstore flushes.
We run the c1.xlarge instances (8 cores, 8gb mem) and everything works well, though not much room for block cache. Matt On Fri, Oct 7, 2011 at 12:43 PM, Anthony Urso <[email protected]> wrote: > We have a use case that will require a ten to twenty EC2 node HBase > cluster to take several hundred million rows of input from a larger > number of EMR instances in daily bursts, and then serve those rows via > low latency random reads, say on the order of 300 or so rows per > second. Before we start coding, I thought it best to ask the experts > for their advice. > > 1) Is this something that HBase will be able to handle gracefully? > 2) Does anyone have any pointers on how to tune HBase for performance > and stability under this load? > 3) Would HBase perform better under this sort of load on twelve large > EC2 instances, six xlarge or three xxlarge? > > Thanks, > Anthony >
