ben-manes commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-896644338
The choice on what is best for this project is yours, and Guava's is certainly a fine cache. If there is no performance, feature, or stability gain then leaving it as is would be a reasonable, conservative choice. I'll provide some background and project information, but I can't speak to what is best for your project. #### History I coauthored Guava's cache, where Charles and I tried to port algorithmic ideas from my earlier CLHM project into Bob Lee's MapMaker class. This was challenging because Bob made design decisions that I disagreed with, as he was very enthusiastic about soft references as an ideal caching strategy and optimized for that (hence replacing the decorator-based ReferenceMap in Google Collections). While this appears optimal in a microbenchmark, the GC impact made it unusable in practice (e.g. AdWords' SRE team vocally be complained). We introduced a concurrent LRU, rewrote expiration to O(1), and designed an API with feedback from Josh Bloch. There were major performance flaws from the original structure that I couldn't fix, but this was in the Java 5 time period (mostly Java 4 idiomatically) so it was still quite good. The API, feature set, and best-in-class performance on low core servers was a step forward. It replaced all of the ad hoc Java caches at Google, which was the main good f or a Guava as a standard internal library. By Java 8 everyone who worked on it had moved on, ConcurrentHashMap was completely rewritten, and the language had evolved. Caffeine is a complete rewrite with a design biased towards size-eviction (not reference), an updated Guava-like API, and no time pressures to allow for more exploration of data structures & algorithms. #### Compatibility The Guava adapters are supported by a port of Guava's unit tests, minus assertions that inspect internal state. It passes the behavioral checks as implemented, though the tests are not exhaustive. Caffeine's test suite is parameterized with test methods declaring their specification constraints. This allows us to brute force runs of every acceptable configuration, which helps detect if features interact badly. This also lets us run the tests against Guava through reverse adapters, thereby providing a reference implementation. In sum the entire test suite runs ~6.8 million scenarios over 1.5 hours on CI. There are small differences. Some are bugs found in Guava, not all of which are fixed. Others are improvements, such as linearizability and a different eviction policy. The most important changes for migration is discussed on the [wiki](https://github.com/ben-manes/caffeine/wiki/Guava). Tests and static analyzers don't catch all mistakes. There has been enough usage to uncover and fix many issues. At this point the Guava team does not actively maintain its cache and recommends Caffeine, which they prefer to use internally. Since I haven't been there since Guava's heyday, the switch means that I had little influence on that. #### Recommendations That's not my place. I can only say that the library tries to be a modern version of Guava's cache, is actively maintained, and enough bugs have been fixed that users have ensured a mature codebase. The design differences overall means that Caffeine is much faster, but tradeoffs means Guava might be slightly better in a few narrow cases. What makes sense for this project and its maintainability is your team's decision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
