Hi Piotr,

  1.  Can we push for better benchmark coverage in the RocksDB project in the 
future?
  2.  Sure, I think we could contribute what we did in flink-benchmarks to 
improve their JMH benchmark [1]. And I will ask them how often will they run 
the benchmark.

  1.  Can we try to catch this kind of problems with RocksDB earlier? For 
example with more frequent RocksDB upgrades, or building test flink builds with 
the most recent RocksDB version to run our benchmarks and validate newer 
RocksDB versions?
  2.  I think this advice could make sense. Apart from releasing our own 
FRocksDB every time, which would take several days to build and distribute 
across maven repos. I prefer to use pre-built official RocksDB jar package and 
we need to maintain a Flink branch which does not contain the FRocksDB TTL 
feature so that the official RocksDB jar package could run directly.

[1] https://github.com/facebook/rocksdb/tree/master/java/jmh

Best,
Yun Tang
________________________________
From: Piotr Nowojski <pnowoj...@apache.org>
Sent: Thursday, August 5, 2021 2:01
To: Yuval Itzchakov <yuva...@gmail.com>
Cc: Yun Tang <myas...@live.com>; Nico Kruber <n...@apache.org>; 
user@flink.apache.org <user@flink.apache.org>; dev <d...@flink.apache.org>
Subject: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

Thanks for the detailed explanation Yun Tang and clearly all of the effort you 
have put into it. Based on what was described here I would also vote for going 
forward with the upgrade.

It's a pity that this regression wasn't caught in the RocksDB community. I 
would have two questions/ideas:
1. Can we push for better benchmark coverage in the RocksDB project in the 
future?
2. Can we try to catch this kind of problems with RocksDB earlier? For example 
with more frequent RocksDB upgrades, or building test flink builds with the 
most recent RocksDB version to run our benchmarks and validate newer RocksDB 
versions?

Best,
Piotrek

śr., 4 sie 2021 o 19:59 Yuval Itzchakov 
<yuva...@gmail.com<mailto:yuva...@gmail.com>> napisał(a):
Hi Yun,
Thank you for the elaborate explanation and even more so for the super hard 
work that you're doing digging into RocksDB and chasing after hundreds of 
commits in order to fix them so we can all benefit.

I can say for myself that optimizing towards memory is more important ATM for 
us, and I'm totally +1 for this.

On Wed, Aug 4, 2021 at 8:50 PM Yun Tang 
<myas...@live.com<mailto:myas...@live.com>> wrote:
Hi Yuval,

Upgrading RocksDB version is a long story since Flink-1.10.
When we first plan to introduce write buffer manager to help control the memory 
usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from current 
RocksDB-5.17. However, we found performance regression in our micro benchmark 
on state operations [1] if bumped to RocksDB-5.18. We did not figure the root 
cause at that time and decide to cherry pick the commits of write buffer 
manager to our own FRocksDB [2]. And we finally released our own 
frocksdbjni-5.17.2-artisans-2.0 at that time.

As time goes no, more and more bugs or missed features have been reported in 
the old RocksDB version. Such as:

  1.  Cannot support ARM platform [3]
  2.  Dose not have stable deleteRange API, which is useful for Flink scale out 
[4]
  3.  Cannot support strict block cache [5]
  4.  Checkpoint might stuck if using UNIVERSVAL compaction strategy [6]
  5.  Uncontrolled log size make us disabled the RocksDB internal LOG [7]
  6.  RocksDB's optimizeForPointLookup option might cause data lost [8]
  7.  Current dummy entry used for memory control in RocksDB-5.17 is too large, 
leading performance problem [9]
  8.  Cannot support alpine-based images.
  9.  .......

Some of the bugs are walked around, and some are still open.

And we decide to make some changes from Flink-1.12. First of all, we reported 
the performance regression problem compared with RocksDB-5.18 and RocksDB-5.17 
to RocksDB community [10]. However, as RocksDB-5.x versions are a bit older for 
the community, and RocksJava usage might not be the core part for facebook 
guys, we did not get useful replies. Thus, we decide to figure out the root 
cause of performance regression by ourself.
Fortunately, we find the cause via binary search the commits among RocksDB-5.18 
and RocksDB-5.17, and updated in the original thread [10]. To be short, the 
performance regression is due to different implementation of `__thread` and 
`thread_local` in gcc and would have more impact on dynamic loading [11], which 
is also what current RocksJava jar package does. With my patch [12], the 
performance regression would disappear if comparing RocksDB-5.18 with 
RocksDB-5.17.

Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to 
RocksDB-6.x. However, another performance regression appeared even with my 
patch [12]. With previous knowledge, we know that we must verify the built .so 
files with our java-based benchmark instead of using RocksDB built-in db-bench. 
I started to search the 1340+ commits from RocksDB-5.18 to RocksDB-6.11 to find 
the performance problem. However, I did not figure out the root cause after 
spending several weeks this time. The performance behaves up and down in those 
commits and I cannot get the commit which lead the performance regression. Take 
this commit of integrating block cache tracer in block-based table reader [13] 
for example, I noticed that this commit would cause a bit performance 
regression and that might be the useless usage accounting in operations, 
however, the problematic code was changed in later commits. Thus, after several 
weeks digging, I have to give up for the endless searching in the thousand 
commits temporarily. As RocksDB community seems not make the project management 
system public, unlike Apache's open JIRA systems, we do not know what benchmark 
they actually run before releasing each version to guarantee the performance.

With my patch [10] on latest RocksDB-6.20.3, we could get the results on 
nexmark in the original thread sent by Stephan, and we can see the performance 
behaves closely in many real-world cases. And we also hope new features, such 
as direct buffer supporting [14] in RocksJava could help improve RocksDB's 
performance in the future.

Hope this could explain what we already did.


[1] https://github.com/apache/flink-benchmarks
[2] https://github.com/ververica/frocksdb/tree/FRocksDB-5.17.2
[3] https://issues.apache.org/jira/browse/FLINK-13598
[4] https://issues.apache.org/jira/browse/FLINK-21321
[5] https://github.com/facebook/rocksdb/issues/6247
[6] https://issues.apache.org/jira/browse/FLINK-21726
[7] https://issues.apache.org/jira/browse/FLINK-15068
[8] https://issues.apache.org/jira/browse/FLINK-17800
[9] https://github.com/facebook/rocksdb/pull/5175
[10] https://github.com/facebook/rocksdb/issues/5774
[11] http://david-grs.github.io/tls_performance_overhead_cost_linux/
[12] https://github.com/ververica/frocksdb/pull/19
[13] https://github.com/facebook/rocksdb/pull/5441/
[14] https://github.com/facebook/rocksdb/pull/2283


Best,
Yun Tang


On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov 
<yuva...@gmail.com<mailto:yuva...@gmail.com>> wrote:
We are heavy users of RocksDB and have had several issues with memory managed 
in Kubernetes, most of them actually went away when we upgraded from Flink 1.9 
to 1.13.

Do we know why there's such a huge performance regression? Can we improve this 
somehow with some flag tweaking? It would be great if we see a more in depth 
explanation of the gains vs losses of upgrading.

On Wed, Aug 4, 2021 at 3:08 PM Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> wrote:


________________________________
From: Nico Kruber <n...@apache.org<mailto:n...@apache.org>>
Sent: Thursday, August 5, 2021 0:10
To: user@flink.apache.org<mailto:user@flink.apache.org> 
<user@flink.apache.org<mailto:user@flink.apache.org>>; dev 
<d...@flink.apache.org<mailto:d...@flink.apache.org>>
Subject: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

That's actually also what I'm seeing most of the time and what I'd expect to
improve with the newer RocksDB version.
Hence, I'd also favour the upgrade even if there is a slight catch with
respect to performance - we should, however, continue to investigate this
together with the RocksDB community.


Nico

On Wednesday, 4 August 2021 14:26:32 CEST David Anderson wrote:
> I am hearing quite often from users who are struggling to manage memory
> usage, and these are all users using RocksDB. While I don't know for
> certain that RocksDB is the cause in every case, from my perspective,
> getting the better memory stability of version 6.20 in place is critical.
>
> Regards,
> David
>
> On Wed, Aug 4, 2021 at 8:08 AM Stephan Ewen 
> <se...@apache.org<mailto:se...@apache.org>> wrote:
> > Hi all!
> >
> > *!!!  If you are a big user of the Embedded RocksDB State Backend and have
> > performance sensitive workloads, please read this !!!*
> >
> > I want to quickly raise some awareness for a RocksDB version upgrade we
> > plan to do, and some possible impact on application performance.
> >
> > *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB
> > unfortunately introduces some non-trivial performance regression. In our
> > Nexmark Benchmark, at least one query is up to 13% slower.
> > With some fixes, this can be improved, but even then there is an overall
> > *regression up to 6% in some queries*. (See attached table for results
> > from relevant Nexmark Benchmark queries).
> >
> > We would do this update nonetheless, because we need to get new features
> > and bugfixes from RocksDB in.
> >
> > Please respond to this mail thread if you have major concerns about this.
> >
> >
> > *### Fallback Plan*
> >
> > Optionally, we could fall back to Plan B, which is to upgrade RocksDB only
> > to version 5.18.4.
> > Which has no performance regression (after applying a custom patch).
> >
> > While this spares us the performance degradation of RocksDB 6.20.x, this
> >
> > has multiple disadvantages:
> >   - Does not include the better memory stability (strict cache control)
> >   - Misses out on some new features which some users asked about
> >   - Does not have the latest RocksDB bugfixes
> >
> > The latest point is especially bad in my opinion. While we can cherry-pick
> > some bugfixes back (and have done this in the past), users typically run
> > into an issue first and need to trace it back to RocksDB, then one of the
> > committers can find the relevant patch from RocksDB master and backport
> > it.
> > That isn't the greatest user experience.
> >
> > Because of those disadvantages, we would prefer to do the upgrade to the
> > newer RocksDB version despite the unfortunate performance regression.
> >
> > Best,
> > Stephan


--
Dr. Nico Kruber | Solutions Architect

Follow us @VervericaData Ververica
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
Wehner




--
Best Regards,
Yuval Itzchakov.

Reply via email to