[GitHub] [spark] ben-manes commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

GitBox Wed, 11 Aug 2021 02:07:54 -0700


ben-manes commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-896644338



   The choice on what is best for this project is yours, and Guava's is 
certainly a fine cache. If there is no performance, feature, or stability gain 
then leaving it as is would be a reasonable, conservative choice. I'll provide 
some background and project information, but I can't speak to what is best for 
your project.
   
   #### History
   I coauthored Guava's cache, where Charles and I tried to port algorithmic 
ideas from my earlier CLHM project into Bob Lee's MapMaker class. This was 
challenging because Bob made design decisions that I disagreed with, as he was 
very enthusiastic about soft references as an ideal caching strategy and 
optimized for that (hence replacing the decorator-based ReferenceMap in Google 
Collections). While this appears optimal in a microbenchmark, the GC impact 
made it unusable in practice (e.g. AdWords' SRE team vocally be complained). We 
introduced a concurrent LRU, rewrote expiration to O(1), and designed an API 
with feedback from Josh Bloch. There were major performance flaws from the 
original structure that I couldn't fix, but this was in the Java 5 time period 
(mostly Java 4 idiomatically) so it was still quite good. The API, feature set, 
and best-in-class performance on low core servers was a step forward. It 
replaced all of the ad hoc Java caches at Google, which was the main good f
 or a Guava as a standard internal library. By Java 8 everyone who worked on it 
had moved on, ConcurrentHashMap was completely rewritten, and the language had 
evolved. Caffeine is a complete rewrite with a design biased towards 
size-eviction (not reference), an updated Guava-like API, and no time pressures 
to allow for more exploration of data structures & algorithms.
   
   #### Compatibility
   The Guava adapters are supported by a port of Guava's unit tests, minus 
assertions that inspect internal state. It passes the behavioral checks as 
implemented, though the tests are not exhaustive.
   
   Caffeine's test suite is parameterized with test methods declaring their 
specification constraints. This allows us to brute force runs of every 
acceptable configuration, which helps detect if features interact badly. This 
also lets us run the tests against Guava through reverse adapters, thereby 
providing a reference implementation. In sum the entire test suite runs ~6.8 
million scenarios over 1.5 hours on CI.
   
   There are small differences. Some are bugs found in Guava, not all of which 
are fixed. Others are improvements, such as linearizability and a different 
eviction policy. The most important changes for migration is discussed on the 
[wiki](https://github.com/ben-manes/caffeine/wiki/Guava).
   
   Tests and static analyzers don't catch all mistakes. There has been enough 
usage to uncover and fix many issues. At this point the Guava team does not 
actively maintain its cache and recommends Caffeine, which they prefer to use 
internally. Since I haven't been there since Guava's heyday, the switch means 
that I had little influence on that.
   
   #### Recommendations
   That's not my place. I can only say that the library tries to be a modern 
version of Guava's cache, is actively maintained, and enough bugs have been 
fixed that users have ensured a mature codebase. The design differences overall 
means that Caffeine is much faster, but tradeoffs means Guava might be slightly 
better in a few narrow cases. What makes sense for this project and its 
maintainability is your team's decision.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ben-manes commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

Reply via email to