Hi,

We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on Cloudera
CDH).

It seems like HBase 2 is slower than HBase 1 for both reading and writing.

I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the standard OSS
versions), running in local mode (no HDFS) on my computer:

 * ingested 15M single-KV rows
 * full table scan over them
 * to remove rpc latency as much as possible, the scan had a filter 'new
RandomRowFilter(0.0001f)', caching set to 10K (more than the number of rows
returned) and hbase.cells.scanned.per.heartbeat.check set to 100M. This
scan returns about 1500 rows/KVs.
 * HBase configured with hbase.regionserver.regionSplitLimit=1 to remove
influence from region splitting

In this test, scanning seems over 50% slower on HBase 2 compared to HBase 1.

I tried flushing & major-compacting before doing the scan, in which case
the scan finishes faster, but the difference between the two HBase versions
stays about the same.

The test code is written in Java, using the client libraries from the
corresponding HBase versions.

Besides the above scan test, I also tested write performance through
BufferedMutator, scans without the filter (thus passing much more data over
the rpc), and sequential random Get requests. They all seem quite a bit
slower on HBase 2. Interestingly, using the HBase 1.6 client to talk to the
HBase 2.2.4 server is faster than using the HBase 2.2.4 client. So it seems
the rpc latency of the new client is worse.

So my question is, is such a large performance drop to be expected when
migrating to HBase 2? Are there any special settings we need to be aware of?

Thanks!

Reply via email to