hey guys, I got a question on the performance test between 1.6.0 and 2.2.4 .
To Andrew, did you turn on the performance tuning on 1.6.0 as well ? or did you run it without any configuration on 1.6.0 ? GC: -XX:+UseShenandoahGC -Xms31g -Xmx31g -XX:+AlwaysPreTouch -XX:+UseNUMA -XX:-UseBiasedLocking Non-default settings: hbase.regionserver.handler.count=256 hbase.ipc.server.callqueue.type=codel dfs.client.read.shortcircuit=true Thanks, Stephen On 2020/05/25 11:28:46, Bruno Dumon <[email protected]> wrote: > Thanks a lot for doing this test. Its results are encouraging. My > non-cluster testing was more focussed on full table scans, which YSCB does > not do. The full table scans are only done by batch jobs, so if they are a > bit slower it is not much of a problem, but in our case they seemed a lot > slower. > > I agree that testing overall performance on a non-cluster environment is > not a good idea, but it doesn't seem unreasonable when focussing on a > specific algorithm? I only started testing in this manner after noticing > problems in cluster-based tests. > > I meanwhile tried a variant of my test where I used the same number of > cells, but spread over much less rows, in total 1500 rows of each 10K > (small) cells. In that test, the difference in the scan speed is much lower > (HBase 2.2.4 being only about 10% slower). This suggests that the slowdown > in HBase 2 might be due to things that happen per row being scanned. > > Anyway, we'll do some further testing, also with our normal workloads on > clusters, and try to further analyse it. > > > On Fri, May 22, 2020 at 1:52 AM Andrew Purtell <[email protected]> wrote: > > > It depends what you are measuring and how. I test every so often with YCSB, > > which admittedly is not representative of real world workloads but is > > widely used for apples to apples testing among datastores, and we can apply > > the same test tool and test methodology to different versions to get > > meaningful results. I also test on real clusters. The single all-in-one > > process zk+master+regionserver "minicluster" cannot provide you meaningful > > performance data. Only distributed clusters can provide meaningful results. > > Some defaults are also important to change, like the number of RPC handlers > > you plan to use in production. > > > > After reading this thread I tested 1.6.0 and 2.2.4 using my standard > > methodology, described below. 2.2.4 is better, often significantly better, > > in most measures in most cases. > > > > Cluster: AWS Amazon Linux AMI, 1 x master, 5 x regionserver, 1 x client, > > m5d.4xlarge > > Hadoop: 2.10.0, ZK: 3.4.14 > > > > > > JVM: 8u252 shenandoah (provided by AMI) > > > > > > GC: -XX:+UseShenandoahGC -Xms31g -Xmx31g -XX:+AlwaysPreTouch -XX:+UseNUMA > > -XX:-UseBiasedLocking > > Non-default settings: hbase.regionserver.handler.count=256 > > hbase.ipc.server.callqueue.type=codel dfs.client.read.shortcircuit=true > > Methodology: > > > > > > 1. Create 100M row base table (ROW_INDEX_V1 encoding, ZSTANDARD > > compression) > > 2. Snapshot base table > > > > > > 3. Enable balancer > > > > > > 4. Clone test table from base table snapshot > > > > > > 5. Balance, then disable balancer > > > > > > 6. Run YCSB 0.18 workload --operationcount 1000000 (1M rows) -threads 200 > > -target 100000 (100k/ops/sec) > > 7. Drop test table > > > > > > 8. Back to step 3 until all workloads complete > > > > > > > > > > > > > > Workload A 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 20552 20655 100.50% > > [OVERALL], Throughput(ops/sec) 97314 96829 99.50% > > [READ], AverageLatency(us) 591 418 70.75% > > [READ], MinLatency(us) 191 201 105.24% > > [READ], MaxLatency(us) 146047 80895 55.39% > > [READ], 95thPercentileLatency(us) 3013 542 17.99% > > [READ], 99thPercentileLatency(us) 5427 2559 47.15% > > [UPDATE], AverageLatency(us) 833 460 55.23% > > [UPDATE], MinLatency(us) 348 230 66.09% > > [UPDATE], MaxLatency(us) 149887 80959 54.01% > > [UPDATE], 95thPercentileLatency(us) 3403 607 17.84% > > [UPDATE], 99thPercentileLatency(us) 5751 3045 52.95% > > > > > > > > > > Workload B 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 20555 20679 100.60% > > [OVERALL], Throughput(ops/sec) 97300 96716 99.40% > > [READ], AverageLatency(us) 417 427 102.54% > > [READ], MinLatency(us) 179 194 108.38% > > [READ], MaxLatency(us) 124095 76799 61.89% > > [READ], 95thPercentileLatency(us) 498 564 113.25% > > [READ], 99thPercentileLatency(us) 3679 3785 102.88% > > [UPDATE], AverageLatency(us) 665 488 73.28% > > [UPDATE], MinLatency(us) 380 237 62.37% > > [UPDATE], MaxLatency(us) 95167 76287 80.16% > > [UPDATE], 95thPercentileLatency(us) 718 629 87.60% > > [UPDATE], 99thPercentileLatency(us) 4015 4023 100.20% > > > > > > > > > > Workload C 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 20525 20648 100.60% > > [OVERALL], Throughput(ops/sec) 97442 96862 99.40% > > [READ], AverageLatency(us) 385 382 99.07% > > [READ], MinLatency(us) 178 198 111.24% > > [READ], MaxLatency(us) 74943 76415 101.96% > > [READ], 95thPercentileLatency(us) 437 477 109.15% > > [READ], 99thPercentileLatency(us) 3349 2219 66.26% > > > > > > > > > > Workload D 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 20538 20644 100.52% > > [OVERALL], Throughput(ops/sec) 97380 96880 99.49% > > [READ], AverageLatency(us) 372 393 105.49% > > [READ], MinLatency(us) 116 137 118.10% > > [READ], MaxLatency(us) 107391 73215 68.18% > > [READ], 95thPercentileLatency(us) 916 983 107.31% > > [READ], 99thPercentileLatency(us) 3183 2473 77.69% > > [INSERT], AverageLatency(us) 732 526 71.86% > > [INSERT], MinLatency(us) 418 289 69.14% > > [INSERT], MaxLatency(us) 109183 80255 73.51% > > [INSERT], 95thPercentileLatency(us) 823 724 87.97% > > [INSERT], 99thPercentileLatency(us) 3961 3003 75.81% > > > > > > > > > > Workload E 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 120157 119728 99.64% > > [OVERALL], Throughput(ops/sec) 16645 16705 100.36% > > [INSERT], AverageLatency(us) 11787 11102 94.19% > > [INSERT], MinLatency(us) 459 296 64.49% > > [INSERT], MaxLatency(us) 172927 131583 76.09% > > [INSERT], 95thPercentileLatency(us) 32143 28911 89.94% > > [INSERT], 99thPercentileLatency(us) 36063 31423 87.13% > > [SCAN], AverageLatency(us) 11891 11875 99.87% > > [SCAN], MinLatency(us) 219 255 116.44% > > [SCAN], MaxLatency(us) 179071 188671 105.36% > > [SCAN], 95thPercentileLatency(us) 32639 29615 90.74% > > [SCAN], 99thPercentileLatency(us) 36671 32175 87.74% > > > > > > > > > > Workload F 1.6.0 2.2.4 Difference > > [OVERALL], RunTime(ms) 20766 20655 99.47% > > [OVERALL], Throughput(ops/sec) 96311 96829 100.54% > > [READ], AverageLatency(us) 1242 591 47.61% > > [READ], MinLatency(us) 183 212 115.85% > > [READ], MaxLatency(us) 80959 90111 111.30% > > [READ], 95thPercentileLatency(us) 3397 1511 44.48% > > [READ], 99thPercentileLatency(us) 4515 3063 67.84% > > [READ-MODIFY-WRITE], AverageLatency(us) 2768 1193 43.10% > > [READ-MODIFY-WRITE], MinLatency(us) 596 496 83.22% > > [READ-MODIFY-WRITE], MaxLatency(us) 128639 112191 87.21% > > [READ-MODIFY-WRITE], 95thPercentileLatency(us) 7071 3263 46.15% > > [READ-MODIFY-WRITE], 99thPercentileLatency(us) 9919 6547 66.00% > > [UPDATE], AverageLatency(us) 1522 601 39.46% > > [UPDATE], MinLatency(us) 369 241 65.31% > > [UPDATE], MaxLatency(us) 89855 35775 39.81% > > [UPDATE], 95thPercentileLatency(us) 3691 1659 44.95% > > [UPDATE], 99thPercentileLatency(us) 5003 3513 70.22% > > > > > > On Wed, May 20, 2020 at 9:10 AM Bruno Dumon <[email protected]> wrote: > > > > > Hi, > > > > > > I think that (idle) background threads would not make much of a > > difference > > > to the raw speed of iterating over cells of a single region served from > > the > > > block cache. I started testing this way after noticing slow down on a > > real > > > installation. I can imagine that there have been various improvements in > > > hbase 2 in other areas which will compensate partly the impact of what I > > > notice in this narrow test, but still I found these results remarkable > > > enough. > > > > > > On Wed, May 20, 2020 at 4:33 PM 张铎(Duo Zhang) <[email protected]> > > > wrote: > > > > > > > Just saw that your tests were on local mode... > > > > > > > > Local mode is not for production so I do not see any related issues for > > > > improving the performance for hbase in local mode. Maybe we just have > > > more > > > > threads in HBase 2 by default which makes it slow on a single machine, > > > not > > > > sure... > > > > > > > > Could you please test it on a distributed cluster? If it is still a > > > > problem, you can open an issue and I believe there will be committers > > > offer > > > > to help verifying the problem. > > > > > > > > Thanks. > > > > > > > > Bruno Dumon <[email protected]> 于2020年5月20日周三 下午4:45写道: > > > > > > > > > For the scan test, there is only minimal rpc involved, I verified > > > through > > > > > ScanMetrics that there are only 2 rpc calls for the scan. It is > > > > essentially > > > > > testing how fast the region server is able to iterate over the cells. > > > > There > > > > > are no delete cells, and the table is fully compacted (1 storage > > file), > > > > and > > > > > all data fits into the block cache. > > > > > > > > > > For the sequential gets (i.e. one get after the other, > > > > non-multi-threaded), > > > > > I tried the BlockingRpcClient. It is about 13% faster than the netty > > > rpc > > > > > client. But the same code on 1.6 is still 90% faster. Concretely, my > > > test > > > > > code does 100K gets of the same row in a loop. On HBase 2.2.4 with > > the > > > > > BlockingRpcClient this takes on average 9 seconds, with HBase 1.6 it > > > > takes > > > > > 4.75 seconds. > > > > > > > > > > On Wed, May 20, 2020 at 9:27 AM Debraj Manna < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > I cross-posted this in slack channel as I was also observing > > > something > > > > > > quite similar. This is the suggestion I received. Reposting here > > for > > > > > > the completion. > > > > > > > > > > > > zhangduo 12:15 PM > > > > > > Does get also have the same performance drop, or only scan? > > > > > > zhangduo 12:18 PM > > > > > > For the rpc layer, hbase2 defaults to netty while hbase1 is pure > > java > > > > > > socket. You can set the rpc client to BlockingRpcClient to see if > > the > > > > > > performance is back. > > > > > > > > > > > > On Mon, May 18, 2020 at 7:58 PM Bruno Dumon <[email protected]> > > > wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > We are looking into migrating from HBase 1.2.x to HBase 2.1.x (on > > > > > > Cloudera > > > > > > > CDH). > > > > > > > > > > > > > > It seems like HBase 2 is slower than HBase 1 for both reading and > > > > > > writing. > > > > > > > > > > > > > > I did a simple test, using HBase 1.6.0 and HBase 2.2.4 (the > > > standard > > > > > OSS > > > > > > > versions), running in local mode (no HDFS) on my computer: > > > > > > > > > > > > > > * ingested 15M single-KV rows > > > > > > > * full table scan over them > > > > > > > * to remove rpc latency as much as possible, the scan had a > > filter > > > > > 'new > > > > > > > RandomRowFilter(0.0001f)', caching set to 10K (more than the > > number > > > > of > > > > > > rows > > > > > > > returned) and hbase.cells.scanned.per.heartbeat.check set to > > 100M. > > > > This > > > > > > > scan returns about 1500 rows/KVs. > > > > > > > * HBase configured with hbase.regionserver.regionSplitLimit=1 to > > > > > remove > > > > > > > influence from region splitting > > > > > > > > > > > > > > In this test, scanning seems over 50% slower on HBase 2 compared > > to > > > > > > HBase 1. > > > > > > > > > > > > > > I tried flushing & major-compacting before doing the scan, in > > which > > > > > case > > > > > > > the scan finishes faster, but the difference between the two > > HBase > > > > > > versions > > > > > > > stays about the same. > > > > > > > > > > > > > > The test code is written in Java, using the client libraries from > > > the > > > > > > > corresponding HBase versions. > > > > > > > > > > > > > > Besides the above scan test, I also tested write performance > > > through > > > > > > > BufferedMutator, scans without the filter (thus passing much more > > > > data > > > > > > over > > > > > > > the rpc), and sequential random Get requests. They all seem > > quite a > > > > bit > > > > > > > slower on HBase 2. Interestingly, using the HBase 1.6 client to > > > talk > > > > to > > > > > > the > > > > > > > HBase 2.2.4 server is faster than using the HBase 2.2.4 client. > > So > > > it > > > > > > seems > > > > > > > the rpc latency of the new client is worse. > > > > > > > > > > > > > > So my question is, is such a large performance drop to be > > expected > > > > when > > > > > > > migrating to HBase 2? Are there any special settings we need to > > be > > > > > aware > > > > > > of? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > -- > > > > > Bruno Dumon > > > > > NGDATA > > > > > http://www.ngdata.com/ > > > > > > > > > > > > > > > > > > -- > > > Bruno Dumon > > > NGDATA > > > http://www.ngdata.com/ > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > > - A23, Crosstalk > > > > > -- > Bruno Dumon > NGDATA > http://www.ngdata.com/ >
