[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944245#comment-16944245 ] ASF subversion and git services commented on SOLR-8241: --- Commit ae80c181d80aad422faf7fdfb8a1c699a59d49d6 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ae80c18 ] SOLR-8241: Add CaffeineCache, an efficient implementation of SolrCache. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943830#comment-16943830 ] ASF subversion and git services commented on SOLR-8241: --- Commit a0396da64b5874886a801f22b7cb81e11ed9642a in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a0396da ] SOLR-8241: Fix an NPE. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943774#comment-16943774 ] Chris M. Hostetter commented on SOLR-8241: -- this seems to have broken {{SolrInfoBeanTest.testCallMBeanInfo}} regardless of seed (at least on linux)... >From jenkins: thetaphi_Lucene-Solr-master-Linux_24858.log.txt {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=SolrInfoBeanTest -Dtests.method=testCallMBeanInfo -Dtests.seed=A6CF2477E5B0DBBA -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=kk-KZ -Dtests.timezone=Africa/Ndjamena -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.21s J0 | SolrInfoBeanTest.testCallMBeanInfo <<< [junit4]> Throwable #1: java.lang.AssertionError: org.apache.solr.search.CaffeineCache [junit4]>at __randomizedtesting.SeedInfo.seed([A6CF2477E5B0DBBA:59A9A94B8EC8A6A4]:0) [junit4]>at org.apache.solr.SolrInfoBeanTest.testCallMBeanInfo(SolrInfoBeanTest.java:73) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4]>at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4]>at java.base/java.lang.reflect.Method.invoke(Method.java:567) [junit4]>at java.base/java.lang.Thread.run(Thread.java:830) {noformat} ...jenkins found that failure on java13, i can reproduce it (again, with any seed) on java11. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943595#comment-16943595 ] ASF subversion and git services commented on SOLR-8241: --- Commit 8007ac0cb0c88838ba6e58e56e2bc23374c15dc4 in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8007ac0 ] SOLR-8241: Add CaffeineCache, an efficient implementation of SolrCache. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943283#comment-16943283 ] Ben Manes commented on SOLR-8241: - Thanks [~ab], [~dsmiley], [~elyograg]! > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943141#comment-16943141 ] David Wayne Smiley commented on SOLR-8241: -- Woohoo! Thanks [~ab] and for your extreme persistence [~ben.manes]. Better late than never. I'd hope to see this as the default in solr configs in 9.0. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943041#comment-16943041 ] Andrzej Bialecki commented on SOLR-8241: Updated patch: * reduced contention in stats counting by using LongAdder-s instead of AtomicLong-s. * added option to set maxRamMB limit (it's an either or with the maxSize limit). I'm not sure I did the right thing when changing the value of this option - basically, if the existing cache was not weighted the {{setMaxRamMB}} rebuilds the cache, instead of just changing the policy limits. * added unit test for testing the limit changes on a live cache. If this patch looks more or less ok I'll add the RefGuide changes and commit it shortly (hopefully in time for 8.3 :) ) > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937851#comment-16937851 ] Shawn Heisey commented on SOLR-8241: bq. Why does Solr have both limits enabled at once? If I understand it all correctly, and it's always possible that I don't, I think the idea is to allow a cache up to N entries, and if a smaller number of entries consumes a specific byte size in heap requirements, to use the smaller number instead. I personally wouldn't have a problem with throwing an error if both limits are specified, but that opinion might go against what others think. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937836#comment-16937836 ] Ben Manes commented on SOLR-8241: - You're right, only one size threshold is supported. Why does Solr have both limits enabled at once? Internally {{maximumSize}} uses a {{Weigher}} that returns {{1}}. That is normally stored on the entry, but since it's a known constant, we can codegen a version to drop that field. This gives a number of items limit using the same logic required to support a limit with variably sized entries. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937764#comment-16937764 ] Andrzej Bialecki commented on SOLR-8241: - [~ben.manes] Existing Solr cache implementations allow using a combination of limits on maximum size (number of items) and maximum heap size (number of bytes), with entries being force evicted whichever condition is met first. I can see how to use {{Weigher}} to implement the latter, but I also spotted this in the {{Caffeine.weigher(...)}}: {code} requireState(!strictParsing || this.maximumSize == UNSET_INT, "weigher can not be combined with maximum size", this.maximumSize); {code} This would suggest that it's not possible to implement this combination of max size / max total weight limits? > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937177#comment-16937177 ] Ben Manes commented on SOLR-8241: - Please take a look at {{Weigher}} and {maximumWeight}} when adding that feature. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937171#comment-16937171 ] David Smiley commented on SOLR-8241: I briefly looked at the patch. I suppose the main feature not present here is a object size based cache? > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937158#comment-16937158 ] Ben Manes commented on SOLR-8241: - I apologize for accidentally providing a partial patch, instead of including the added files. I attached a full patch to the github issue in case it helps later on. I do not have a small search workload that might represent Solr's. These are a few small ones at 512 and I don't know why the SolrLru and SolrLfu are identical. This shows Caffeine does better in some workloads, at least. I cannot run a large trace due to the slow eviction behavior of Solr's caches. ||Trace||Caffeine||SolrLru||SolrLfu|| |OLTP|32.80 %|23.68 %|23.68 %| |multi3|42.76 %|32.77 %|32.77 %| |gli|34.10 %|0.98 %|0.98 %| |web07|49.26 %|45.73 %|45.73 %| > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937135#comment-16937135 ] Ben Manes commented on SOLR-8241: - haha, 5 ops/s. That correlates to the S3 not completing in over 12 hours and having to stop it. The eviction time for a large cache (400k) is horrible, making it impossible to run a large trace. For small traces I can offer numbers if interested. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937130#comment-16937130 ] Andrzej Bialecki commented on SOLR-8241: - [~ben.manes] I was able to run JMH GetPutBenchmark and EvictionBenchmark, using your (partial) patch from #350. I tested Solr LFUCache, FastLRUCache, Caffeine, Guava and ConcurrentLinkedHashMap. I got similar results to yours for Solr caches (similar hardware?), and this time Caffeine was by far the best performing cache in these benchmarks. {{EvictionBenchmark}} results were interesting, too - due to the high cost of eviction in Solr caches (when using {{lowWaterMark = upperWaterMark - 1}} to force the eviction of one entry at a time) their eviction performance was just abysmal. See the attached screenshots. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936270#comment-16936270 ] Ben Manes commented on SOLR-8241: - I copied {{ConcurrentLRUCache}} and {{ConcurrentLFUCache}} classes for use by JMH w/o the dependencies, which caused issues with the zip archive. This multithreaded workload is a 100% hit of gets and puts, using a Zipf distribution, on a 4-core (8 HT) laptop. The caches were set to use the async eviction thread. The performance of the caches is hurt significantly due to using {{AtomicLong}} for the statistics. This throttles the throughput because it creates a lot of contention on this field. Ideally if you use {{LongAdder}} then this impact will decrease, e.g. to 10% overhead. However, this cannot be done because both caches depend on the result of the increment which is not available in the striped counter. {code:java} // Lru if (islive) { e.lastAccessed = stats.accessCounter.incrementAndGet(); } // Lfu if (islive) { e.lastAccessed = stats.accessCounter.incrementAndGet(); e.hits.incrementAndGet(); } {code} This benchmark stresses critical sections, so a single contention point may reduce throughput more than if spread across multiple points. This is probably why the LFU cache is faster than the LRU one, because it has more counters to content on. ||Benchmark||Cache Type||Score|| |GetPutBenchmark.read_only|Solr_Lfu||42,338,073 ops/s|| |GetPutBenchmark.read_only|Solr_Lru||23,938,078 ops/s|| |GetPutBenchmark.readwrite|Solr_Lfu||19,422,639 ops/s|| |GetPutBenchmark.readwrite|Solr_Lru||24,814,577 ops/s|| |GetPutBenchmark.write_only|Solr_Lfu||7,068,798 ops/s|| |GetPutBenchmark.write_only|Solr_Lru||8,387,552 ops/s|| I did not run with Caffeine, but my published [desktop benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#desktop-class] were run on this laptop. That showed significantly higher throughput (e.g. 150M reads/s). I am *still* waiting for the S3 trace to complete... > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936214#comment-16936214 ] Ben Manes commented on SOLR-8241: - [~elyograg], [~dsmiley], [~ab] Do you think this is a valid setting for the Solr LRU / LFU caches to evaluate its hit rate? That disables async behavior and keeps it bounded to the maximum for a fair comparison of eviction policies. However running a large trace, like Search S3, seems to take forever. {code:java} cache = new ConcurrentLRUCache<>(maximumSize, maximumSize - 1); {code} > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936175#comment-16936175 ] David Smiley commented on SOLR-8241: Even if _hypothetically_ these cache impl competitors were about the same, I'd rather outright remove our special caches so that we can use another tested/maintained cache. It's always annoyed me a little that our solrconfig.xml actually refers to the cache implementations as well. That's minutia alright! There may be a sunk cost fallacy in our minds behind continuing to maintain our caches. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936170#comment-16936170 ] Ben Manes commented on SOLR-8241: - I'm sorry you ran into issues running JMH. This seems to be a bug in their plugin and I added a workaround to the issue. FastLRUCache is an unbounded {{ConcurrentHashMap}} which uses a background thread to prune it when it exceeds a threshold. This has the following tradeoffs: * The read/write throughput will match {{ConcurrentHashMap}}, making it close to the ideal performance. * The cache may have runaway memory growth under high load when the cleaner thread cannot keep up. * The cleanup takes {{O(n lg n)}} time, which could be expensive when the system is already under load. {{Caffeine}} is designed to optimize system performance, rather than just get/put throughput. If we can exceed the performance requirements for throughput, we can sacrifice a little to improve other characteristics without impacting real-world performance. This includes a best in class hit rate, no runaway growth, {{O(1)}} costs, and many more features. {{FastLRUCache}} may beat {{Caffeine}} by a nanosecond or two per operation on a cache hit. However the miss penalty (I/O, deserialization, higher GC) will mean that is has lower system performance. We can run some simulations of trace files to show the hit rate differences. Assuming a strict LRU, we can see that in a [search trace|https://github.com/ben-manes/caffeine/wiki/Efficiency#search] Caffeine's hit rate is significantly higher. Let me know how I can help with the evaluation. I'd gladly write some integrations into my tooling with your guidance. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935988#comment-16935988 ] Shawn Heisey commented on SOLR-8241: I can only comment about LFUCache, as that's the one that I wrote. LFUCache is written using a naive "computer science student" implementation. Evicting entries from the cache can be expected to be slow if the cache is large. Because of this, I can only recommend its use when the cache size is small. I would expect test differences between LRUCache/FastLRUCache and the Caffeine cache -- the author has said it uses an LFU algorithm. Caffeine would likely behave similarly to LFUCache. Overall for most installations, LFU should provide better cache hit ratios than LRU. Test scenarios, which often do not behave like real-world setups, may show the opposite. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935967#comment-16935967 ] Andrzej Bialecki commented on SOLR-8241: - Updated patch with an extended benchmark in {{TestFastLRUCache}}. I attached the full results of the benchmark (rather puzzling) but here are some highlights: * LRUCache is on average the weakest contender, being usually the slowest and with the lowest hit ratio. * Caffeine cache is not a clear winner, though - I'd say on average it's on par with FastLRUCache, both when it comes to speed and hit ratio. * its detailed behavior differs from that of LRUCache and FastLRUCache in specific test scenarios, sometimes greatly, but I couldn't figure out a consistent pattern or a trend where Caffeine would consistently outperform FastLRUCache. Based on these results I can't clearly recommend Caffeine over FastLRUCache for a specific scenario (eg. "as the number of threads grows, you should use ...", or "for larger caches you should use...", or "for fast caching you should use ..."). I tried running JMH benchmarks in Caffeine but after struggling with build issues ([#350|https://github.com/ben-manes/caffeine/issues/350]) I gave up for now. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933299#comment-16933299 ] Andrzej Bialecki commented on SOLR-8241: - Updated patch for current master: * updated Caffeine version to 2.8.0, which contains important fixes and improvements, among others an adaptive eviction policy ([https://github.com/ben-manes/caffeine/issues/106]) * added ramBytesUsed accounting. It's possible that there's a cleaner way to do it using Weigher but I need to read more about it - also, I'm not sure if it wouldn't make it more difficult to eventually implement maxRamMB limit (not implemented yet in this patch) * I'm curious to test the performance of this cache compared to other Solr caches, but I think I'm going to use the JMH benchmark in Caffeine for this - I don't want to bring too many dependencies into Solr. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, proposal.patch > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org