Are there any configurations that I need to set to improve read latency? I'm running HBase on 10 ec2 m1.large instances (7.5GB RAM).
Also, as the size of the data gets bigger is it normal to get higher latency for reads? I'm testing out the YCSB benchmark workload. With a data size of ~40-50GB (~200+ regions), I can get around 10-20ms and I can push the throughput of a read-only workload to around 3000+ operations per second. However, with a data size of 200GB (~1k+ regions), the smallest latency I can get is 30+ms (with 100 operations per second) and I can't get the throughput to go beyond 400+ operations per second (110+ms latency). I tried increasing the hbase.hregion.max.filesize to 2GB to reduce the number of regions and it seems to make it worse. I also tried increasing the heap size to 4GB, hbase.regionserver.handler.count = 100, and vm.swappiness = 0. However, it still didn't improve the performance. I'm also sure that the YCSB client benchmark driver is not becoming the bottleneck because the CPU utilization is low. Thanks, Harold
