Re: HBase 2 slower than HBase 1?

Andrew Purtell Thu, 21 May 2020 16:52:53 -0700

It depends what you are measuring and how. I test every so often with YCSB,
which admittedly is not representative of real world workloads but is
widely used for apples to apples testing among datastores, and we can apply
the same test tool and test methodology to different versions to get
meaningful results. I also test on real clusters. The single all-in-one
process zk+master+regionserver "minicluster" cannot provide you meaningful
performance data. Only distributed clusters can provide meaningful results.
Some defaults are also important to change, like the number of RPC handlers
you plan to use in production.


After reading this thread I tested 1.6.0 and 2.2.4 using my standard
methodology, described below. 2.2.4 is better, often significantly better,
in most measures in most cases.

Cluster: AWS Amazon Linux AMI, 1 x master, 5 x regionserver, 1 x client,
m5d.4xlarge
Hadoop: 2.10.0, ZK: 3.4.14


JVM: 8u252 shenandoah (provided by AMI)


GC: -XX:+UseShenandoahGC -Xms31g -Xmx31g -XX:+AlwaysPreTouch -XX:+UseNUMA
-XX:-UseBiasedLocking
Non-default settings: hbase.regionserver.handler.count=256
hbase.ipc.server.callqueue.type=codel dfs.client.read.shortcircuit=true
Methodology:


  1. Create 100M row base table (ROW_INDEX_V1 encoding, ZSTANDARD
compression)
  2. Snapshot base table


  3. Enable balancer


  4. Clone test table from base table snapshot


  5. Balance, then disable balancer


  6. Run YCSB 0.18 workload --operationcount 1000000 (1M rows) -threads 200
-target 100000 (100k/ops/sec)
  7. Drop test table


  8. Back to step 3 until all workloads complete






Workload A 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 20552 20655 100.50%
[OVERALL], Throughput(ops/sec) 97314 96829 99.50%
[READ], AverageLatency(us) 591 418 70.75%
[READ], MinLatency(us) 191 201 105.24%
[READ], MaxLatency(us) 146047 80895 55.39%
[READ], 95thPercentileLatency(us) 3013 542 17.99%
[READ], 99thPercentileLatency(us) 5427 2559 47.15%
[UPDATE], AverageLatency(us) 833 460 55.23%
[UPDATE], MinLatency(us) 348 230 66.09%
[UPDATE], MaxLatency(us) 149887 80959 54.01%
[UPDATE], 95thPercentileLatency(us) 3403 607 17.84%
[UPDATE], 99thPercentileLatency(us) 5751 3045 52.95%




Workload B 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 20555 20679 100.60%
[OVERALL], Throughput(ops/sec) 97300 96716 99.40%
[READ], AverageLatency(us) 417 427 102.54%
[READ], MinLatency(us) 179 194 108.38%
[READ], MaxLatency(us) 124095 76799 61.89%
[READ], 95thPercentileLatency(us) 498 564 113.25%
[READ], 99thPercentileLatency(us) 3679 3785 102.88%
[UPDATE], AverageLatency(us) 665 488 73.28%
[UPDATE], MinLatency(us) 380 237 62.37%
[UPDATE], MaxLatency(us) 95167 76287 80.16%
[UPDATE], 95thPercentileLatency(us) 718 629 87.60%
[UPDATE], 99thPercentileLatency(us) 4015 4023 100.20%




Workload C 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 20525 20648 100.60%
[OVERALL], Throughput(ops/sec) 97442 96862 99.40%
[READ], AverageLatency(us) 385 382 99.07%
[READ], MinLatency(us) 178 198 111.24%
[READ], MaxLatency(us) 74943 76415 101.96%
[READ], 95thPercentileLatency(us) 437 477 109.15%
[READ], 99thPercentileLatency(us) 3349 2219 66.26%




Workload D 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 20538 20644 100.52%
[OVERALL], Throughput(ops/sec) 97380 96880 99.49%
[READ], AverageLatency(us) 372 393 105.49%
[READ], MinLatency(us) 116 137 118.10%
[READ], MaxLatency(us) 107391 73215 68.18%
[READ], 95thPercentileLatency(us) 916 983 107.31%
[READ], 99thPercentileLatency(us) 3183 2473 77.69%
[INSERT], AverageLatency(us) 732 526 71.86%
[INSERT], MinLatency(us) 418 289 69.14%
[INSERT], MaxLatency(us) 109183 80255 73.51%
[INSERT], 95thPercentileLatency(us) 823 724 87.97%
[INSERT], 99thPercentileLatency(us) 3961 3003 75.81%




Workload E 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 120157 119728 99.64%
[OVERALL], Throughput(ops/sec) 16645 16705 100.36%
[INSERT], AverageLatency(us) 11787 11102 94.19%
[INSERT], MinLatency(us) 459 296 64.49%
[INSERT], MaxLatency(us) 172927 131583 76.09%
[INSERT], 95thPercentileLatency(us) 32143 28911 89.94%
[INSERT], 99thPercentileLatency(us) 36063 31423 87.13%
[SCAN], AverageLatency(us) 11891 11875 99.87%
[SCAN], MinLatency(us) 219 255 116.44%
[SCAN], MaxLatency(us) 179071 188671 105.36%
[SCAN], 95thPercentileLatency(us) 32639 29615 90.74%
[SCAN], 99thPercentileLatency(us) 36671 32175 87.74%




Workload F 1.6.0 2.2.4 Difference
[OVERALL], RunTime(ms) 20766 20655 99.47%
[OVERALL], Throughput(ops/sec) 96311 96829 100.54%
[READ], AverageLatency(us) 1242 591 47.61%
[READ], MinLatency(us) 183 212 115.85%
[READ], MaxLatency(us) 80959 90111 111.30%
[READ], 95thPercentileLatency(us) 3397 1511 44.48%
[READ], 99thPercentileLatency(us) 4515 3063 67.84%
[READ-MODIFY-WRITE], AverageLatency(us) 2768 1193 43.10%
[READ-MODIFY-WRITE], MinLatency(us) 596 496 83.22%
[READ-MODIFY-WRITE], MaxLatency(us) 128639 112191 87.21%
[READ-MODIFY-WRITE], 95thPercentileLatency(us) 7071 3263 46.15%
[READ-MODIFY-WRITE], 99thPercentileLatency(us) 9919 6547 66.00%
[UPDATE], AverageLatency(us) 1522 601 39.46%
[UPDATE], MinLatency(us) 369 241 65.31%
[UPDATE], MaxLatency(us) 89855 35775 39.81%
[UPDATE], 95thPercentileLatency(us) 3691 1659 44.95%
[UPDATE], 99thPercentileLatency(us) 5003 3513 70.22%


On Wed, May 20, 2020 at 9:10 AM Bruno Dumon <bru...@ngdata.com> wrote:

> Hi,
>
> I think that (idle) background threads would not make much of a difference
> to the raw speed of iterating over cells of a single region served from the
> block cache. I started testing this way after noticing slow down on a real
> installation. I can imagine that there have been various improvements in
> hbase 2 in other areas which will compensate partly the impact of what I
> notice in this narrow test, but still I found these results remarkable
> enough.
>
> On Wed, May 20, 2020 at 4:33 PM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > Just saw that your tests were on local mode...
> >
> > Local mode is not for production so I do not see any related issues for
> > improving the performance for hbase in local mode. Maybe we just have
> more
> > threads in HBase 2 by default which makes it slow on a single machine,
> not
> > sure...
> >
> > Could you please test it on a distributed cluster? If it is still a
> > problem, you can open an issue and I believe there will be committers
> offer
> > to help verifying the problem.
> >
> > Thanks.
> >
> > Bruno Dumon <bru...@ngdata.com> 于2020年5月20日周三 下午4:45写道：
> >
> > > For the scan test, there is only minimal rpc involved, I verified
> through
> > > ScanMetrics that there are only 2 rpc calls for the scan. It is
> > essentially
> > > testing how fast the region server is able to iterate over the cells.
> > There
> > > are no delete cells, and the table is fully compacted (1 storage file),
> > and
> > > all data fits into the block cache.
> > >
> > > For the sequential gets (i.e. one get after the other,
> > non-multi-threaded),
> > > I tried the BlockingRpcClient. It is about 13% faster than the netty
> rpc
> > > client. But the same code on 1.6 is still 90% faster. Concretely, my
> test
> > > code does 100K gets of the same row in a loop. On HBase 2.2.4 with the
> > > BlockingRpcClient this takes on average 9 seconds, with HBase 1.6 it
> > takes
> > > 4.75 seconds.
> > >
> > > On Wed, May 20, 2020 at 9:27 AM Debraj Manna <subharaj.ma...@gmail.com
> >
> > > wrote:
> > >
> > > > I cross-posted this in slack channel as I was also observing
> something
> > > > quite similar. This is the suggestion I received. Reposting here for
> > > > the completion.
> > > >
> > > > zhangduo  12:15 PM
> > > > Does get also have the same performance drop, or only scan?
> > > > zhangduo  12:18 PM
> > > > For the rpc layer, hbase2 defaults to netty while hbase1 is pure java
> > > > socket. You can set the rpc client to BlockingRpcClient to see if the
> > > > performance is back.
> > > >
> > > > On Mon, May 18, 2020 at 7:58 PM Bruno Dumon <bru...@ngdata.com>
> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on
> > > > Cloudera
> > > > > CDH).
> > > > >
> > > > > It seems like HBase 2 is slower than HBase 1 for both reading and
> > > > writing.
> > > > >
> > > > > I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the
> standard
> > > OSS
> > > > > versions), running in local mode (no HDFS) on my computer:
> > > > >
> > > > >  * ingested 15M single-KV rows
> > > > >  * full table scan over them
> > > > >  * to remove rpc latency as much as possible, the scan had a filter
> > > 'new
> > > > > RandomRowFilter(0.0001f)', caching set to 10K (more than the number
> > of
> > > > rows
> > > > > returned) and hbase.cells.scanned.per.heartbeat.check set to 100M.
> > This
> > > > > scan returns about 1500 rows/KVs.
> > > > >  * HBase configured with hbase.regionserver.regionSplitLimit=1 to
> > > remove
> > > > > influence from region splitting
> > > > >
> > > > > In this test, scanning seems over 50% slower on HBase 2 compared to
> > > > HBase 1.
> > > > >
> > > > > I tried flushing & major-compacting before doing the scan, in which
> > > case
> > > > > the scan finishes faster, but the difference between the two HBase
> > > > versions
> > > > > stays about the same.
> > > > >
> > > > > The test code is written in Java, using the client libraries from
> the
> > > > > corresponding HBase versions.
> > > > >
> > > > > Besides the above scan test, I also tested write performance
> through
> > > > > BufferedMutator, scans without the filter (thus passing much more
> > data
> > > > over
> > > > > the rpc), and sequential random Get requests. They all seem quite a
> > bit
> > > > > slower on HBase 2. Interestingly, using the HBase 1.6 client to
> talk
> > to
> > > > the
> > > > > HBase 2.2.4 server is faster than using the HBase 2.2.4 client. So
> it
> > > > seems
> > > > > the rpc latency of the new client is worse.
> > > > >
> > > > > So my question is, is such a large performance drop to be expected
> > when
> > > > > migrating to HBase 2? Are there any special settings we need to be
> > > aware
> > > > of?
> > > > >
> > > > > Thanks!
> > > >
> > >
> > >
> > > --
> > > Bruno Dumon
> > > NGDATA
> > > http://www.ngdata.com/
> > >
> >
>
>
> --
> Bruno Dumon
> NGDATA
> http://www.ngdata.com/
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: HBase 2 slower than HBase 1?

Reply via email to