Hi, We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on Cloudera CDH).
It seems like HBase 2 is slower than HBase 1 for both reading and writing. I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the standard OSS versions), running in local mode (no HDFS) on my computer: * ingested 15M single-KV rows * full table scan over them * to remove rpc latency as much as possible, the scan had a filter 'new RandomRowFilter(0.0001f)', caching set to 10K (more than the number of rows returned) and hbase.cells.scanned.per.heartbeat.check set to 100M. This scan returns about 1500 rows/KVs. * HBase configured with hbase.regionserver.regionSplitLimit=1 to remove influence from region splitting In this test, scanning seems over 50% slower on HBase 2 compared to HBase 1. I tried flushing & major-compacting before doing the scan, in which case the scan finishes faster, but the difference between the two HBase versions stays about the same. The test code is written in Java, using the client libraries from the corresponding HBase versions. Besides the above scan test, I also tested write performance through BufferedMutator, scans without the filter (thus passing much more data over the rpc), and sequential random Get requests. They all seem quite a bit slower on HBase 2. Interestingly, using the HBase 1.6 client to talk to the HBase 2.2.4 server is faster than using the HBase 2.2.4 client. So it seems the rpc latency of the new client is worse. So my question is, is such a large performance drop to be expected when migrating to HBase 2? Are there any special settings we need to be aware of? Thanks!
