For the scan test, there is only minimal rpc involved, I verified through
ScanMetrics that there are only 2 rpc calls for the scan. It is essentially
testing how fast the region server is able to iterate over the cells. There
are no delete cells, and the table is fully compacted (1 storage file), and
all data fits into the block cache.

For the sequential gets (i.e. one get after the other, non-multi-threaded),
I tried the BlockingRpcClient. It is about 13% faster than the netty rpc
client. But the same code on 1.6 is still 90% faster. Concretely, my test
code does 100K gets of the same row in a loop. On HBase 2.2.4 with the
BlockingRpcClient this takes on average 9 seconds, with HBase 1.6 it takes
4.75 seconds.

On Wed, May 20, 2020 at 9:27 AM Debraj Manna <subharaj.ma...@gmail.com>
wrote:

> I cross-posted this in slack channel as I was also observing something
> quite similar. This is the suggestion I received. Reposting here for
> the completion.
>
> zhangduo  12:15 PM
> Does get also have the same performance drop, or only scan?
> zhangduo  12:18 PM
> For the rpc layer, hbase2 defaults to netty while hbase1 is pure java
> socket. You can set the rpc client to BlockingRpcClient to see if the
> performance is back.
>
> On Mon, May 18, 2020 at 7:58 PM Bruno Dumon <bru...@ngdata.com> wrote:
> >
> > Hi,
> >
> > We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on
> Cloudera
> > CDH).
> >
> > It seems like HBase 2 is slower than HBase 1 for both reading and
> writing.
> >
> > I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the standard OSS
> > versions), running in local mode (no HDFS) on my computer:
> >
> >  * ingested 15M single-KV rows
> >  * full table scan over them
> >  * to remove rpc latency as much as possible, the scan had a filter 'new
> > RandomRowFilter(0.0001f)', caching set to 10K (more than the number of
> rows
> > returned) and hbase.cells.scanned.per.heartbeat.check set to 100M. This
> > scan returns about 1500 rows/KVs.
> >  * HBase configured with hbase.regionserver.regionSplitLimit=1 to remove
> > influence from region splitting
> >
> > In this test, scanning seems over 50% slower on HBase 2 compared to
> HBase 1.
> >
> > I tried flushing & major-compacting before doing the scan, in which case
> > the scan finishes faster, but the difference between the two HBase
> versions
> > stays about the same.
> >
> > The test code is written in Java, using the client libraries from the
> > corresponding HBase versions.
> >
> > Besides the above scan test, I also tested write performance through
> > BufferedMutator, scans without the filter (thus passing much more data
> over
> > the rpc), and sequential random Get requests. They all seem quite a bit
> > slower on HBase 2. Interestingly, using the HBase 1.6 client to talk to
> the
> > HBase 2.2.4 server is faster than using the HBase 2.2.4 client. So it
> seems
> > the rpc latency of the new client is worse.
> >
> > So my question is, is such a large performance drop to be expected when
> > migrating to HBase 2? Are there any special settings we need to be aware
> of?
> >
> > Thanks!
>


-- 
Bruno Dumon
NGDATA
http://www.ngdata.com/

Reply via email to