Just saw that your tests were on local mode... Local mode is not for production so I do not see any related issues for improving the performance for hbase in local mode. Maybe we just have more threads in HBase 2 by default which makes it slow on a single machine, not sure...
Could you please test it on a distributed cluster? If it is still a problem, you can open an issue and I believe there will be committers offer to help verifying the problem. Thanks. Bruno Dumon <bru...@ngdata.com> 于2020年5月20日周三 下午4:45写道: > For the scan test, there is only minimal rpc involved, I verified through > ScanMetrics that there are only 2 rpc calls for the scan. It is essentially > testing how fast the region server is able to iterate over the cells. There > are no delete cells, and the table is fully compacted (1 storage file), and > all data fits into the block cache. > > For the sequential gets (i.e. one get after the other, non-multi-threaded), > I tried the BlockingRpcClient. It is about 13% faster than the netty rpc > client. But the same code on 1.6 is still 90% faster. Concretely, my test > code does 100K gets of the same row in a loop. On HBase 2.2.4 with the > BlockingRpcClient this takes on average 9 seconds, with HBase 1.6 it takes > 4.75 seconds. > > On Wed, May 20, 2020 at 9:27 AM Debraj Manna <subharaj.ma...@gmail.com> > wrote: > > > I cross-posted this in slack channel as I was also observing something > > quite similar. This is the suggestion I received. Reposting here for > > the completion. > > > > zhangduo 12:15 PM > > Does get also have the same performance drop, or only scan? > > zhangduo 12:18 PM > > For the rpc layer, hbase2 defaults to netty while hbase1 is pure java > > socket. You can set the rpc client to BlockingRpcClient to see if the > > performance is back. > > > > On Mon, May 18, 2020 at 7:58 PM Bruno Dumon <bru...@ngdata.com> wrote: > > > > > > Hi, > > > > > > We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on > > Cloudera > > > CDH). > > > > > > It seems like HBase 2 is slower than HBase 1 for both reading and > > writing. > > > > > > I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the standard > OSS > > > versions), running in local mode (no HDFS) on my computer: > > > > > > * ingested 15M single-KV rows > > > * full table scan over them > > > * to remove rpc latency as much as possible, the scan had a filter > 'new > > > RandomRowFilter(0.0001f)', caching set to 10K (more than the number of > > rows > > > returned) and hbase.cells.scanned.per.heartbeat.check set to 100M. This > > > scan returns about 1500 rows/KVs. > > > * HBase configured with hbase.regionserver.regionSplitLimit=1 to > remove > > > influence from region splitting > > > > > > In this test, scanning seems over 50% slower on HBase 2 compared to > > HBase 1. > > > > > > I tried flushing & major-compacting before doing the scan, in which > case > > > the scan finishes faster, but the difference between the two HBase > > versions > > > stays about the same. > > > > > > The test code is written in Java, using the client libraries from the > > > corresponding HBase versions. > > > > > > Besides the above scan test, I also tested write performance through > > > BufferedMutator, scans without the filter (thus passing much more data > > over > > > the rpc), and sequential random Get requests. They all seem quite a bit > > > slower on HBase 2. Interestingly, using the HBase 1.6 client to talk to > > the > > > HBase 2.2.4 server is faster than using the HBase 2.2.4 client. So it > > seems > > > the rpc latency of the new client is worse. > > > > > > So my question is, is such a large performance drop to be expected when > > > migrating to HBase 2? Are there any special settings we need to be > aware > > of? > > > > > > Thanks! > > > > > -- > Bruno Dumon > NGDATA > http://www.ngdata.com/ >