See http://hbase.apache.org/book.html#performance and the notes over in the other thread, "How to improve HBase throughput with YCSB?" St.Ack
On Sun, May 29, 2011 at 2:28 PM, Sean Bigdatafun <[email protected]> wrote: > For pure random read, I do not think there exists a good way to improve > latency. Essentially, every single read would need to go through disk seek. > The latency definitely has something to do with server (HBase server/HDFS > client) rather than client (YCSB) > > > On Sun, May 29, 2011 at 1:23 PM, Harold Lim <[email protected]> wrote: > >> Are there any configurations that I need to set to improve read latency? >> I'm running HBase on 10 ec2 m1.large instances (7.5GB RAM). >> >> Also, as the size of the data gets bigger is it normal to get higher >> latency for reads? >> >> I'm testing out the YCSB benchmark workload. >> >> With a data size of ~40-50GB (~200+ regions), I can get around 10-20ms and >> I can push the throughput of a read-only workload to around 3000+ operations >> per second. >> >> However, with a data size of 200GB (~1k+ regions), the smallest latency I >> can get is 30+ms (with 100 operations per second) and I can't get the >> throughput to go beyond 400+ operations per second (110+ms latency). >> >> I tried increasing the hbase.hregion.max.filesize to 2GB to reduce the >> number of regions and it seems to make it worse. >> >> >> I also tried increasing the heap size to 4GB, >> hbase.regionserver.handler.count = 100, and vm.swappiness = 0. However, it >> still didn't improve the performance. >> >> >> I'm also sure that the YCSB client benchmark driver is not becoming the >> bottleneck because the CPU utilization is low. >> >> >> >> >> >> >> >> Thanks, >> Harold >> > > > > -- > --Sean >
