We have a use case that will require a ten to twenty EC2 node HBase cluster to take several hundred million rows of input from a larger number of EMR instances in daily bursts, and then serve those rows via low latency random reads, say on the order of 300 or so rows per second. Before we start coding, I thought it best to ask the experts for their advice.
1) Is this something that HBase will be able to handle gracefully? 2) Does anyone have any pointers on how to tune HBase for performance and stability under this load? 3) Would HBase perform better under this sort of load on twelve large EC2 instances, six xlarge or three xxlarge? Thanks, Anthony
