Improving HBase read performance (based on YCSB)

Bharath Ravi Mon, 13 Feb 2012 21:56:40 -0800

Hi all,

I have a distributed HBase setup, on which I'm running the
YCSB<https://github.com/brianfrankcooper/YCSB/wiki/running-a-workload>benchmark.
There are 5 region servers, each a Dual core with around 4GB of memory,
connected simply by a 1Gbps ethernet switch.


The number of "handlers" per regionserver is set to 500 (!) and HDFS's
maximum receivers per datanode is 4096.

The benchmark dataset is large enough not to fit in memory.
Update/Insert/Write throughput goes up to 8000 ops/sec easily.
However, I see read latencies in the order of seconds, and read throughputs
of only a few 100 ops per second.

"Top" tells me that the CPU's on regionservers spend 70-80% of their time
waiting for IO, while disk and network
have plenty of unused bandwidth. How could I diagnose where the read
bottleneck is?

Any help would be greatly appreciated :)

Thanks in advance!
-- 
Bharath Ravi

Improving HBase read performance (based on YCSB)

Reply via email to