[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944245#comment-16944245
 ] 

ASF subversion and git services commented on SOLR-8241:
---

Commit ae80c181d80aad422faf7fdfb8a1c699a59d49d6 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ae80c18 ]

SOLR-8241: Add CaffeineCache, an efficient implementation of SolrCache.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943830#comment-16943830
 ] 

ASF subversion and git services commented on SOLR-8241:
---

Commit a0396da64b5874886a801f22b7cb81e11ed9642a in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a0396da ]

SOLR-8241: Fix an NPE.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-03 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943774#comment-16943774
 ] 

Chris M. Hostetter commented on SOLR-8241:
--

this seems to have broken {{SolrInfoBeanTest.testCallMBeanInfo}} regardless of 
seed (at least on linux)...

>From jenkins: thetaphi_Lucene-Solr-master-Linux_24858.log.txt
{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=SolrInfoBeanTest 
-Dtests.method=testCallMBeanInfo -Dtests.seed=A6CF2477E5B0DBBA 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=kk-KZ 
-Dtests.timezone=Africa/Ndjamena -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 0.21s J0 | SolrInfoBeanTest.testCallMBeanInfo <<<
   [junit4]> Throwable #1: java.lang.AssertionError: 
org.apache.solr.search.CaffeineCache
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([A6CF2477E5B0DBBA:59A9A94B8EC8A6A4]:0)
   [junit4]>at 
org.apache.solr.SolrInfoBeanTest.testCallMBeanInfo(SolrInfoBeanTest.java:73)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]>at 
java.base/java.lang.reflect.Method.invoke(Method.java:567)
   [junit4]>at java.base/java.lang.Thread.run(Thread.java:830)
{noformat}

...jenkins found that failure on java13, i can reproduce it (again, with any 
seed) on java11.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943595#comment-16943595
 ] 

ASF subversion and git services commented on SOLR-8241:
---

Commit 8007ac0cb0c88838ba6e58e56e2bc23374c15dc4 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8007ac0 ]

SOLR-8241: Add CaffeineCache, an efficient implementation of SolrCache.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943283#comment-16943283
 ] 

Ben Manes commented on SOLR-8241:
-

Thanks [~ab], [~dsmiley], [~elyograg]!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread David Wayne Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943141#comment-16943141
 ] 

David Wayne Smiley commented on SOLR-8241:
--

Woohoo!  Thanks [~ab] and for your extreme persistence [~ben.manes].  Better 
late than never.  I'd hope to see this as the default in solr configs in 9.0.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943041#comment-16943041
 ] 

Andrzej Bialecki commented on SOLR-8241:


Updated patch:
* reduced contention in stats counting by using LongAdder-s instead of 
AtomicLong-s.
* added option to set maxRamMB limit (it's an either or with the maxSize 
limit). I'm not sure I did the right thing when changing the value of this 
option - basically, if the existing cache was not weighted the {{setMaxRamMB}} 
rebuilds the cache, instead of just changing the policy limits.
* added unit test for testing the limit changes on a live cache.

If this patch looks more or less ok I'll add the RefGuide changes and commit it 
shortly (hopefully in time for 8.3 :) )

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Shawn Heisey (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937851#comment-16937851
 ] 

Shawn Heisey commented on SOLR-8241:


bq. Why does Solr have both limits enabled at once?

If I understand it all correctly, and it's always possible that I don't, I 
think the idea is to allow a cache up to N entries, and if a smaller number of 
entries consumes a specific byte size in heap requirements, to use the smaller 
number instead.

I personally wouldn't have a problem with throwing an error if both limits are 
specified, but that opinion might go against what others think.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937836#comment-16937836
 ] 

Ben Manes commented on SOLR-8241:
-

You're right, only one size threshold is supported. Why does Solr have both 
limits enabled at once?

Internally {{maximumSize}} uses a {{Weigher}} that returns {{1}}. That is 
normally stored on the entry, but since it's a known constant, we can codegen a 
version to drop that field. This gives a number of items limit using the same 
logic required to support a limit with variably sized entries.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937764#comment-16937764
 ] 

Andrzej Bialecki  commented on SOLR-8241:
-

[~ben.manes] Existing Solr cache implementations allow using a combination of 
limits on maximum size (number of items) and maximum heap size (number of 
bytes), with entries being force evicted whichever condition is met first. I 
can see how to use {{Weigher}} to implement the latter, but I also spotted this 
in the {{Caffeine.weigher(...)}}:
{code}
requireState(!strictParsing || this.maximumSize == UNSET_INT,
"weigher can not be combined with maximum size", this.maximumSize);
{code}
This would suggest that it's not possible to implement this combination of max 
size / max total weight limits?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-24 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937177#comment-16937177
 ] 

Ben Manes commented on SOLR-8241:
-

Please take a look at {{Weigher}} and {maximumWeight}} when adding that feature.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-24 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937171#comment-16937171
 ] 

David Smiley commented on SOLR-8241:


I briefly looked at the patch.  I suppose the main feature not present here is 
a object size based cache?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-24 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937158#comment-16937158
 ] 

Ben Manes commented on SOLR-8241:
-

I apologize for accidentally providing a partial patch, instead of including 
the added files. I attached a full patch to the github issue in case it helps 
later on.

I do not have a small search workload that might represent Solr's. These are a 
few small ones at 512 and I don't know why the SolrLru and SolrLfu are 
identical. This shows Caffeine does better in some workloads, at least. I 
cannot run a large trace due to the slow eviction behavior of Solr's caches.

||Trace||Caffeine||SolrLru||SolrLfu||
|OLTP|32.80 %|23.68 %|23.68 %|
|multi3|42.76 %|32.77 %|32.77 %|
|gli|34.10 %|0.98 %|0.98 %|
|web07|49.26 %|45.73 %|45.73 %|

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-24 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937135#comment-16937135
 ] 

Ben Manes commented on SOLR-8241:
-

haha, 5 ops/s. That correlates to the S3 not completing in over 12 hours and 
having to stop it. The eviction time for a large cache (400k) is horrible, 
making it impossible to run a large trace. For small traces I can offer numbers 
if interested.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-24 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937130#comment-16937130
 ] 

Andrzej Bialecki  commented on SOLR-8241:
-

[~ben.manes] I was able to run JMH GetPutBenchmark and EvictionBenchmark, using 
your (partial) patch from #350. I tested Solr LFUCache, FastLRUCache, Caffeine, 
Guava and ConcurrentLinkedHashMap. I got similar results to yours for Solr 
caches (similar hardware?), and this time Caffeine was by far the best 
performing cache in these benchmarks.

{{EvictionBenchmark}} results were interesting, too - due to the high cost of 
eviction in Solr caches (when using {{lowWaterMark = upperWaterMark - 1}} to 
force the eviction of one entry at a time) their eviction performance was just 
abysmal.

See the attached screenshots.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936270#comment-16936270
 ] 

Ben Manes commented on SOLR-8241:
-

I copied {{ConcurrentLRUCache}} and {{ConcurrentLFUCache}} classes for use by 
JMH w/o the dependencies, which caused issues with the zip archive. This 
multithreaded workload is a 100% hit of gets and puts, using a Zipf 
distribution, on a 4-core (8 HT) laptop. The caches were set to use the async 
eviction thread.

The performance of the caches is hurt significantly due to using {{AtomicLong}} 
for the statistics. This throttles the throughput because it creates a lot of 
contention on this field. Ideally if you use {{LongAdder}} then this impact 
will decrease, e.g. to 10% overhead. However, this cannot be done because both 
caches depend on the result of the increment which is not available in the 
striped counter.
{code:java}
// Lru
if (islive) {
  e.lastAccessed = stats.accessCounter.incrementAndGet();
}

// Lfu
if (islive) {
  e.lastAccessed = stats.accessCounter.incrementAndGet();
  e.hits.incrementAndGet();
}
{code}
This benchmark stresses critical sections, so a single contention point may 
reduce throughput more than if spread across multiple points. This is probably 
why the LFU cache is faster than the LRU one, because it has more counters to 
content on.
||Benchmark||Cache Type||Score||
|GetPutBenchmark.read_only|Solr_Lfu||42,338,073 ops/s||
|GetPutBenchmark.read_only|Solr_Lru||23,938,078 ops/s||
|GetPutBenchmark.readwrite|Solr_Lfu||19,422,639 ops/s||
|GetPutBenchmark.readwrite|Solr_Lru||24,814,577 ops/s||
|GetPutBenchmark.write_only|Solr_Lfu||7,068,798 ops/s||
|GetPutBenchmark.write_only|Solr_Lru||8,387,552 ops/s||

I did not run with Caffeine, but my published [desktop 
benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#desktop-class] 
were run on this laptop. That showed significantly higher throughput (e.g. 150M 
reads/s).

I am *still* waiting for the S3 trace to complete...

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936214#comment-16936214
 ] 

Ben Manes commented on SOLR-8241:
-

[~elyograg], [~dsmiley], [~ab]

Do you think this is a valid setting for the Solr LRU / LFU caches to evaluate 
its hit rate? That disables async behavior and keeps it bounded to the maximum 
for a fair comparison of eviction policies. However running a large trace, like 
Search S3, seems to take forever.
{code:java}
cache = new ConcurrentLRUCache<>(maximumSize, maximumSize - 1);
{code}

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936175#comment-16936175
 ] 

David Smiley commented on SOLR-8241:


Even if _hypothetically_ these cache impl competitors were about the same, I'd 
rather outright remove our special caches so that we can use another 
tested/maintained cache.  It's always annoyed me a little that our 
solrconfig.xml actually refers to the cache implementations as well.  That's 
minutia alright!  There may be a sunk cost fallacy in our minds behind 
continuing to maintain our caches.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936170#comment-16936170
 ] 

Ben Manes commented on SOLR-8241:
-

I'm sorry you ran into issues running JMH. This seems to be a bug in their 
plugin and I added a workaround to the issue.

FastLRUCache is an unbounded {{ConcurrentHashMap}} which uses a background 
thread to prune it when it exceeds a threshold. This has the following 
tradeoffs:
 * The read/write throughput will match {{ConcurrentHashMap}}, making it close 
to the ideal performance.
 * The cache may have runaway memory growth under high load when the cleaner 
thread cannot keep up. 
 * The cleanup takes {{O(n lg n)}} time, which could be expensive when the 
system is already under load. 

{{Caffeine}} is designed to optimize system performance, rather than just 
get/put throughput. If we can exceed the performance requirements for 
throughput, we can sacrifice a little to improve other characteristics without 
impacting real-world performance. This includes a best in class hit rate, no 
runaway growth, {{O(1)}} costs, and many more features. 

{{FastLRUCache}} may beat {{Caffeine}} by a nanosecond or two per operation on 
a cache hit. However the miss penalty (I/O, deserialization, higher GC) will 
mean that is has lower system performance. We can run some simulations of trace 
files to show the hit rate differences. Assuming a strict LRU, we can see that 
in a [search 
trace|https://github.com/ben-manes/caffeine/wiki/Efficiency#search] Caffeine's 
hit rate is significantly higher.

Let me know how I can help with the evaluation. I'd gladly write some 
integrations into my tooling with your guidance.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread Shawn Heisey (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935988#comment-16935988
 ] 

Shawn Heisey commented on SOLR-8241:


I can only comment about LFUCache, as that's the one that I wrote.  LFUCache is 
written using a naive "computer science student" implementation.  Evicting 
entries from the cache can be expected to be slow if the cache is large.  
Because of this, I can only recommend its use when the cache size is small.

I would expect test differences between LRUCache/FastLRUCache and the Caffeine 
cache -- the author has said it uses an LFU algorithm.  Caffeine would likely 
behave similarly to LFUCache.

Overall for most installations, LFU should provide better cache hit ratios than 
LRU.  Test scenarios, which often do not behave like real-world setups, may 
show the opposite.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935967#comment-16935967
 ] 

Andrzej Bialecki  commented on SOLR-8241:
-

Updated patch with an extended benchmark in {{TestFastLRUCache}}. I attached 
the full results of the benchmark (rather puzzling) but here are some 
highlights:
 * LRUCache is on average the weakest contender, being usually the slowest and 
with the lowest hit ratio.
 * Caffeine cache is not a clear winner, though - I'd say on average it's on 
par with FastLRUCache, both when it comes to speed and hit ratio.
 * its detailed behavior differs from that of LRUCache and FastLRUCache in 
specific test scenarios, sometimes greatly, but I couldn't figure out a 
consistent pattern or a trend where Caffeine would consistently outperform 
FastLRUCache. Based on these results I can't clearly recommend Caffeine over 
FastLRUCache for a specific scenario (eg. "as the number of threads grows, you 
should use ...", or "for larger caches you should use...", or "for fast caching 
you should use ...").

I tried running JMH benchmarks in Caffeine but after struggling with build 
issues ([#350|https://github.com/ben-manes/caffeine/issues/350]) I gave up for 
now.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-19 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933299#comment-16933299
 ] 

Andrzej Bialecki  commented on SOLR-8241:
-

Updated patch for current master:
 * updated Caffeine version to 2.8.0, which contains important fixes and 
improvements, among others an adaptive eviction policy 
([https://github.com/ben-manes/caffeine/issues/106]) 
 * added ramBytesUsed accounting. It's possible that there's a cleaner way to 
do it using Weigher but I need to read more about it - also, I'm not sure if it 
wouldn't make it more difficult to eventually implement maxRamMB limit (not 
implemented yet in this patch)
 * I'm curious to test the performance of this cache compared to other Solr 
caches, but I think I'm going to use the JMH benchmark in Caffeine for this - I 
don't want to bring too many dependencies into Solr.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org