Re: Improvment - speed up HBase to 2-3 times

Sergey Semenoff Wed, 16 Sep 2020 08:42:50 -0700

I used utility YCSB - there is ops/sec. It means just get some random
record. Full results below:


Host1

[OVERALL], RunTime(ms), 267033

[OVERALL], Throughput(ops/sec), 224691.33028502096

[TOTAL_GCS_PS_Scavenge], Count, 98

[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 2056

[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.7699422917766717

[TOTAL_GCS_PS_MarkSweep], Count, 0

[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0

[TOTAL_GCs], Count, 98

[TOTAL_GC_TIME], Time(ms), 2056

[TOTAL_GC_TIME_%], Time(%), 0.7699422917766717

[READ], Operations, 60000000

[READ], AverageLatency(us), 876.4223452166667

[READ], MinLatency(us), 151

[READ], MaxLatency(us), 236159

[READ], 95thPercentileLatency(us), 1298

[READ], 99thPercentileLatency(us), 2571

[READ], Return=OK, 60000000



--

Host2

[OVERALL], RunTime(ms), 259716

[OVERALL], Throughput(ops/sec), 231021.5774153306

[TOTAL_GCS_PS_Scavenge], Count, 142

[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 2342

[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.9017542238445071

[TOTAL_GCS_PS_MarkSweep], Count, 0

[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0

[TOTAL_GCs], Count, 142

[TOTAL_GC_TIME], Time(ms), 2342

[TOTAL_GC_TIME_%], Time(%), 0.9017542238445071

[READ], Operations, 60000000

[READ], AverageLatency(us), 851.8991412666667

[READ], MinLatency(us), 163

[READ], MaxLatency(us), 710655

[READ], 95thPercentileLatency(us), 1208

[READ], 99thPercentileLatency(us), 2163

[READ], Return=OK, 60000000

It was remote clients.
Server side - 4 hosts (E-2698 v4 2.2 GHz / 40 cores)





ср, 16 сент. 2020 г., 13:43 onmstester onmstester
<[email protected]>:

> Hi,
>
>
>
> Do you mean row/sec by ops/sec? or partition/sec (in cassandra terms), if
> so then how many rows per op or partition? what's your data model and the
> host spec?
>
> Is your client remote or on the host?
>
> Sent using https://www.zoho.com/mail/
>
>
>
>
> ---- On Wed, 16 Sep 2020 14:11:35 +0430 Sergey Semenoff <
> [email protected]> wrote ----
>
>
> Hi *!
>
> I think everybody who working with the real BigData know – performance is
> very important.
>
> Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> times
> when reading huge amount of data.
>
>
> For example – this is Cassandra the performance test run from 2 hosts
> (client side)
>
> Host1 - Throughput(ops/sec), 231 021
>
> Host2 - Throughput(ops/sec), 224 691
>
>
>
> Summary ~450 000.
>
> HBase shows in the same conditions only 210 000.
>
>
>
> Maybe this is one of the reason why Cassandra is more popular (see
> https://db-engines.com/en/ranking/wide+column+store)
>
> I’ve done an improvment which can make HBase faster up 2-3 times (it
> depends of many reasons, and sometimes even faster).
>
> With the improvement HBase speed up to 430 000 ops/sec.
>
> See the picture in attachment.
>
>
>
> If you interested to get this improvement in release you can help to
> attract some developers attention here -
> https://issues.apache.org/jira/browse/HBASE-23887
>
> Put some line there with your opinion and vote if you think it could be
> useful for your work.
>
> I believe discussion about this approach can make HBase more useful and
> popular.
>
>
>
> Thanks for attention)
>
> With the best regards,
>
> Pustota

Re: Improvment - speed up HBase to 2-3 times

Reply via email to