from:"Ben Manes \(JIRA\)"

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2018-12-05 Thread Ben Manes (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710613#comment-16710613
 ] 

Ben Manes commented on SOLR-8241:
-

Another year, another ping!

Do you think that you'll have some time over the holidays or in 2019 to revisit 
this?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2018-12-05 Thread Ben Manes (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Priority: Major  (was: Minor)

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Priority: Major
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2018-12-05 Thread Ben Manes (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Issue Type: Improvement  (was: Wish)

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-12-30 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307000#comment-16307000
 ] 

Ben Manes commented on SOLR-8241:
-

Shawn, is this issue something you'd be interested in finalizing in the new 
year? If not, what are the next steps to resolve?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10553) Caffeine causes SIGSEGV in Solr tests

2017-08-06 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115873#comment-16115873
 ] 

Ben Manes commented on SOLR-10553:
--

In this case Caffeine is uses an embedded queue from JCTools, so [~nitsanw] may 
have some insights. Both libraries are eagerly awaiting Java 9 to replace their 
usages. This is the first report that I've seen on invalid offsets, so perhaps 
details on the JVM & hardware would be helpful since I do not know how to 
reproduce.

> Caffeine causes SIGSEGV in Solr tests
> -
>
> Key: SOLR-10553
> URL: https://issues.apache.org/jira/browse/SOLR-10553
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Uwe Schindler
>Priority: Critical
>
> While running the Solr tests, the JVM often crushes with SIGSEGV. The reason 
> is Caffeine's usage of Unsafe. Please open an issue on this lib or remove 
> this library.
> I looked at Caffeine's usage of Unsafe: IT IS TOTALLY USELESS!
> See this log: 
> https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/3347/artifact/solr/build/solr-core/test/J2/hs_err_pid17108.log
> {noformat}
> Current thread (0xbf7af000):  JavaThread "Thread-4" [_thread_in_Java, 
> id=17265, stack(0xc342e000,0xc347f000)]
> Stack: [0xc342e000,0xc347f000],  sp=0xc347dba0,  free space=318k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 5565% c1 
> com.github.benmanes.caffeine.cache.BaseMpscLinkedArrayQueue.poll()Ljava/lang/Object;
>  (108 bytes) @ 0xe7ae7848 [0xe7ae6e20+0x0a28]
> j  com.github.benmanes.caffeine.cache.BoundedLocalCache.drainWriteBuffer()V+21
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(Ljava/lang/Runnable;)V+10
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(Ljava/lang/Runnable;)V+11
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run()V+5
> j  
> org.apache.solr.store.blockcache.BlockCache$$Lambda$278.execute(Ljava/lang/Runnable;)V+1
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers()V+54
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleAfterWrite()V+44
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(Lcom/github/benmanes/caffeine/cache/Node;Ljava/lang/Runnable;J)V+47
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.put(Ljava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object;+209
> j  
> com.github.benmanes.caffeine.cache.BoundedLocalCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+5
> j  
> com.github.benmanes.caffeine.cache.LocalManualCache.put(Ljava/lang/Object;Ljava/lang/Object;)V+8
> j  
> org.apache.solr.store.blockcache.BlockCache.store(Lorg/apache/solr/store/blockcache/BlockCacheKey;I[BII)Z+176
> j  org.apache.solr.store.blockcache.BlockCacheTest$1.test()V+331
> j  org.apache.solr.store.blockcache.BlockCacheTest$1.test(I)V+8
> j  org.apache.solr.store.blockcache.BlockCacheTest$1.run()V+47
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x65f759]  JavaCalls::call_helper(JavaValue*, methodHandle 
> const&, JavaCallArguments*, Thread*)+0x319
> V  [libjvm.so+0x910889]  os::os_exception_wrapper(void (*)(JavaValue*, 
> methodHandle const&, JavaCallArguments*, Thread*), JavaValue*, methodHandle 
> const&, JavaCallArguments*, Thread*)+0x19
> V  [libjvm.so+0x65e093]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x163
> V  [libjvm.so+0x6ee089]  thread_entry(JavaThread*, Thread*)+0x89
> V  [libjvm.so+0xa747d4]  JavaThread::thread_main_inner()+0xf4
> V  [libjvm.so+0x912e5c]  thread_native_entry(Thread*)+0x10c
> C  [libpthread.so.0+0x6f72]  start_thread+0xd2
> C  [libc.so.6+0xee2ae]  clone+0x5e
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x8d0ffbb0
> Register to memory mapping:
> EAX=0x0040 is an unknown value
> EBX=0x is pointing into the stack for thread: 0xbf7ddc00
> ECX=0x is pointing into the stack for thread: 0xbf7ddc00
> EDX=0xc9a00918 is pointing into object: 0xc9a004f8
> [error occurred during error reporting (printing register info), id 0xb]
> Registers:
> EAX=0x0040, EBX=0x, ECX=0x, EDX=0xc9a00918
> ESP=0xc347dba0, EBP=0xc347dcf0, ESI=0xc9c81ee8, EDI=0xc347dcc8
> EIP=0xe7ae7848, EFLAGS=0x00010246, CR2=0x8d0ffbb0
> Top of Stack: (sp=0xc347dba0)
> 0xc347dba0:   0038 bebd16c0 c347dbf8 f702a6f9
> 0xc347dbb0:   e6c031c0 0028 0004 c347dc24 
> Instructions: (pc=0xe7ae7848)
> 0xe7ae7828:   f0 58 0f be b6 e4 01 00 00 83 fe 00 8b bc 24 f8
> 0xe7ae7838:   00 00 00 8b b4 24 98 00 00 00 0f 85 cc 06 00 00
> 0xe7ae7848:   89 0c 3e 8d 34 3e 8b fe 33 f9 c1 ef 14 83 ff 00
> 0xe7ae7858:   0f 85 cf 06 00 00 f0 83 44 24

[jira] [Commented] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902487#comment-15902487
 ] 

Ben Manes commented on SOLR-10205:
--

For writes you might prefer to use an atomic computation instead of a racy 
get-compute-put. The stampeding writers will cause a storm of removal 
notifications indicating the value was replaced. I think that would result in 
more frequently needing to free and acquire slots in the bank. This would 
reduce I/O costs as well, of course. Caffeine performs this by using a 
lock-free lookup that falls back to a computeIfAbsent, so that a hit won't 
thrash on locks if the entry is present.

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 6.5, master (7.0)
>
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873511#comment-15873511
 ] 

Ben Manes commented on SOLR-10141:
--

Released 2.4.0

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873361#comment-15873361
 ] 

Ben Manes commented on SOLR-10141:
--

That makes sense. If its a fallback when an empty slot can't be acquired, it 
may be preferable to calling cleanUp() always. But a stress test would be 
necessary to verify that, as the spin time might be too small so that it didn't 
help.

In most traces frequency dominates over recency, so most insertions are 
pollutants. The impact of a failed insertion might not have had a negative 
result, as a popular item would make its way in. Then the failing one-hit 
wonders wouldn't have disrupted the LRU as much. That's less meaningful with 
Caffeine, since we switched to TinyLFU.

As an aside, I'd appreciate help in moving SOLR-8241 forward. Its been approved 
but backlogged as the committer has not had the time to actively participate in 
Solr. But if that's crossing territories or you feel uncomfortable due to this 
bug, I understand.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873334#comment-15873334
 ] 

Ben Manes commented on SOLR-10141:
--

If you wish to ensure a very strict bounding by throttling writers, that would 
do the job. I'm not sure if its needed except in your tests, as in practice the 
assumption is its cleaned up in a timely enough manner.

The cache uses a bounded write buffer to provide some slack, minimize the 
response latencies for writers, and defers the cleanup to the executor 
(scheduled as immediate). This allows the cache to temporarily exceed the high 
water mark, but catch up quickly. In general a high write rate on a cache is 
actually 2-3 inserts/sec, there's memory headroom for GC, and the server isn't 
cpu bounded. If instead we ensured a strict bound then we'd need a global lock 
to throttle writers on which limits concurrency. So its a trade-off that works 
for most usages.

CLHM uses the same design, so I wonder if only your tests are affected but it 
is okay in practice. CLHM uses an unbounded write buffer, whereas in Caffeine 
its bounded to provide some back pressure if full. Being full is very rare, so 
this is mostly to replace linked lists with a growable ring buffer. The slack 
is probably excessive as I didn't have a good sizing parameter (max ~= 128 x 
ncpu). The cleanUp() call forces the caller to block and do the maintenance 
itself, rather than relying on the async processing (which may be in-flight or 
triggered on a subsequent operation). You can get a sense of this write-ahead 
log design from this [slide 
deck|https://docs.google.com/presentation/d/1NlDxyXsUG1qlVHMl4vsUUBQfAJ2c2NsFPNPr2qymIBs].

I'm not sure what, or if, I can do anything regarding your size concern. But 
I'll wait for releasing 2.4 until you're satisfied that we've resolved all the 
issues.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873011#comment-15873011
 ] 

Ben Manes commented on SOLR-10141:
--

[Pull Request|https://github.com/ben-manes/caffeine/pull/144] with the fix and 
your test case.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872983#comment-15872983
 ] 

Ben Manes edited comment on SOLR-10141 at 2/18/17 5:08 AM:
---

Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The 
[eviction 
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
 performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
#* resurrected if a race occurred (e.g. was thought expired, but newly accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because 
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
 can perform its update outside of a hash table lock (e.g. a computation). It 
synchronizes on the entry to update, checking first if it was still alive. This 
resulted in a race where the entry was removed from the hash table, the value 
updated, and entry marked as dead. When the listener was notified, it received 
the wrong value.

The solution I have now is to expand the synchronized block on eviction. This 
passes your test and should be cheap. I'd like to review it a little more and 
incorporate your test into my suite.

This is an excellent find. I've stared at the code many times and the race 
seems obvious in hindsight.


was (Author: ben.manes):
Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The 
[eviction 
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
 performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
   - resurrected if a race occurred (e.g. was thought expired, but newly 
accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because 
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
 can perform its update outside of a hash table lock (e.g. a computation). It 
synchronizes on the entry to update, checking first if it was still alive. This 
resulted in a race where the entry was removed from the hash table, the value 
updated, and entry marked as dead. When the listener was notified, it received 
the wrong value.

The solution I have now is to expand the synchronized block on eviction. This 
passes your test and should be cheap. I'd like to review it a little more and 
incorporate your test into my suite.

This is an excellent find. I've stared at the code many times and the race 
seems obvious in hindsight.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872983#comment-15872983
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The 
[eviction 
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
 performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
   - resurrected if a race occurred (e.g. was thought expired, but newly 
accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because 
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
 can perform its update outside of a hash table lock (e.g. a computation). It 
synchronizes on the entry to update, checking first if it was still alive. This 
resulted in a race where the entry was removed from the hash table, the value 
updated, and entry marked as dead. When the listener was notified, it received 
the wrong value.

The solution I have now is to expand the synchronized block on eviction. This 
passes your test and should be cheap. I'd like to review it a little more and 
incorporate your test into my suite.

This is an excellent find. I've stared at the code many times and the race 
seems obvious in hindsight.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872969#comment-15872969
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks! I'm resolving some issues with the latest error-prone (static analyzer) 
and dig into it.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872943#comment-15872943
 ] 

Ben Manes commented on SOLR-10141:
--

Can you provide me with the latest version of a self-contained test? If I can 
reproduce and debug it, I'll have a fix over the weekend.

v2 introduced a new eviction policy to take into account the frequency. The 
eviction should be rapid, so these issues remaining are surprising. I've tried 
to be diligent about testing, so will investigate.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872244#comment-15872244
 ] 

Ben Manes commented on SOLR-8241:
-

[~Timothy055], solr master is now on 2.3.5 (to upgrade its usage in the block 
cache).

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: proposal.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872221#comment-15872221
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks [~ysee...@gmail.com]. Sorry about any frustrations this caused.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-10141:
-
Attachment: Solr10141Test.java

I updated the test to use Awaitility to avoid race conditions when asserting 
the counts. This allowed me to enable the FJP executor so that the listener and 
eviction occur asynchronously. The test passes against master and I have not 
tested against the 1.0.1 which Solr still uses (please upgrade!).

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868205#comment-15868205
 ] 

Ben Manes commented on SOLR-10141:
--

Running your test against master and it doesn't fail. Can you please try 
Caffeine 2.3.5? The only change needed is that the RemovalListener is now 
lambda friendly.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868168#comment-15868168
 ] 

Ben Manes commented on SOLR-10141:
--

Oh, also older jdk8 versions had a bug in fjp causing it to drop tasks. That's 
also a possibility at play.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868161#comment-15868161
 ] 

Ben Manes commented on SOLR-10141:
--

I plan on porting the test to Caffeine's suite and checking against 2.x. Just 
waiting for my train to start.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868110#comment-15868110
 ] 

Ben Manes commented on SOLR-10141:
--

It may be FJP retrying a task if it is slow to complete. If so, we might need 
to put a guard to ignore multiple attempts. I can help when you have a test 
case to investigate with.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency

2017-02-13 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864352#comment-15864352
 ] 

Ben Manes commented on SOLR-10121:
--

Can you try a local hack of changing Caffeine versions and, if it fails, try 
reverting back to CLHM? Both should be easy changes that could help us isolate 
it.

Also note that CLHM ran the eviction listener on the same thread, whereas 
Caffeine delegates that to the executor. If there is a race due to that, you 
could use `executor(Runnable::run)` in the builder.

> BlockCache corruption with high concurrency
> ---
>
> Key: SOLR-10121
> URL: https://issues.apache.org/jira/browse/SOLR-10121
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>
> Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption 
> bug (either that or the test is flawed... TBD).
> The failing test is TestBlockCache.testBlockCacheConcurrent()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency

2017-02-13 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863924#comment-15863924
 ] 

Ben Manes commented on SOLR-10121:
--

Yes, a write should constitute a publication. Caffeine decorates a 
ConcurrentHashMap but does bypass it at times. By default eviction is 
asynchronous by delegating to fjp commonPool, but can be configured to use the 
caller instead. That might be useful for testing.

Solr uses an old version of Caffeine. A patch was reviewed and approved, but 
needs someone to merge it in SOLR-8241. I'm not aware of a visibility bug in 
any release, but staying current would be helpful as I have fixed bugs since 
that version.

> BlockCache corruption with high concurrency
> ---
>
> Key: SOLR-10121
> URL: https://issues.apache.org/jira/browse/SOLR-10121
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>
> Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption 
> bug (either that or the test is flawed... TBD).
> The failing test is TestBlockCache.testBlockCacheConcurrent()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-25 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839058#comment-15839058
 ] 

Ben Manes commented on SOLR-8241:
-

[~elyograg]: Solr 6.4.0 was just released. Do you think we can make a 
commitment to resolve this for 6.5.0? We've iterated on the patch for about a 
year now.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: proposal.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-05 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802076#comment-15802076
 ] 

Ben Manes commented on SOLR-8241:
-

I think the tests all passed last I checked with this new SolrCache, but I 
don't think we had made it the default yet so that might be a premature 
statement. If you want to upgrade only the 1.x usage, that would be a safe 
change to extract from this patch (a minor API tweak). If anything the later 
versions also have fewer bugs.

I'd love to see this patch land.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-30 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Attachment: SOLR-8241.patch

Rebased and updated to v2.3.4. Any remaining tasks?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Attachment: proposal.patch

I have some basic tests ported (testSimple, testTimeDecay). The first performs 
access operations and the second ensures frequency is taken into account. The 
changes also adds cumulative stats by aggregating during warm() (this was 
simpler than the init approach since Caffeine's stats object is immutable).

Minor changes are to rename the class to TinyLfuCache to emphasize the policy 
over the library. That conforms with the HBase and Accumulo integration, and 
matches the existing naming convention.

This version of the patch requires changes in Caffeine 2.3.4-SNAPSHOT. I 
improved the hot iteration order which previously returned in warm, hot, cold 
order. Given real world cache sizes it might not have made a difference, but 
was a required improvement for the tests. So I'm adding this version as a 
proposal and can cut a release when you're ready for integration.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558709#comment-15558709
 ] 

Ben Manes commented on SOLR-8241:
-

I think there is a small bug in the "hottest" ordering provided by Caffeine, so 
the warmed-up cache doesn't contain the desired entries. I believe this is a 
simple mistake of concatenating two lists in the wrong order, so that it 
chooses a luke-warm entry instead. I'm not sure how to test my changes to 
verify this with a custom jar in Ivy, though.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558521#comment-15558521
 ] 

Ben Manes commented on SOLR-8241:
-

I can take a stab at tests, but its unclear what to include other than basic 
operations. Otherwise I'd defer to the library for deeper testing, e.g. scan 
resistance and efficiency. In those areas writing tests is for the author to 
have assurance that the library does what it claims. I'd prefer if someone 
obtained production traces instead, which I think would show you an interesting 
hit rate curve for how the policies stack up.

I'm pretty sure the current warming, which populates with the hottest entries 
first, should be good enough. Since reads dominate writes, the hot entries will 
quickly have a high frequency by the time an eviction is triggered. We can try 
to give the first few hot entries a small bump too, by adding a few accesses, 
to add an extra nudge.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-25 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521377#comment-15521377
 ] 

Ben Manes commented on SOLR-8241:
-

I took look to refresh myself on LFUCache and decay. I don't think there is an 
issue because TinyLFU has similar logic to age the frequencies asynchronously. 
It observes a sample of 10 * maximum size and then halves the counters. The 
difference is the counters are stored in an array, are 4-bit, and represent all 
items (not just those currently residing in the cache). This extended history 
and using frequency for admission (rather than eviction) is what allows the 
policy to have a superior hit rate and be amortized O(1).

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517521#comment-15517521
 ] 

Ben Manes commented on SOLR-8241:
-

The cache does provide basic snapshot features ordered by the policy (hot/cold, 
young/old). You might be able to change perspectives by having the old 
searchers use a snapshot and rewarming the cache instance.

I do think it will be okay to recreate and warm, just not optimal. It looks 
like in my patch I did try to transfer over the hottest entries, so its 
probably alright.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517433#comment-15517433
 ] 

Ben Manes commented on SOLR-8241:
-

Can you explain why a new instance is required and the entire cache swapped?

There is an open issue for supporting bulk refresh, but its been low on my list 
of priorities. Not sure if that would have worked for this rewarming process.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Comment: was deleted

(was: Expiration is tricky because it means the data is no longer valid to be 
consumed and should not be consumed. The middle ground here is to 
refreshAfterWrite, which serves stale entries and tries to asynchronously 
reload the value. That covers the common case by not penalizing active entries 
by evicting, while letting inactive ones expire.

That probably isn't enough and its impossible to cover all use-cases. So 
instead its more of a data structure to (hopefully) be malleable to have custom 
workarounds. The CacheWriter can be used to create a victim cache, which a 
CacheLoader could retrieve from. So you could let expired entries populate the 
victim and be promoted back into the cache, sometimes within the same atomic 
operation. Then a rewarming could clear the victim when its done as its 
contents are unnecessary. So something like this is might be workable.)

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517094#comment-15517094
 ] 

Ben Manes commented on SOLR-8241:
-

Expiration is tricky because it means the data is no longer valid to be 
consumed and should not be consumed. The middle ground here is to 
refreshAfterWrite, which serves stale entries and tries to asynchronously 
reload the value. That covers the common case by not penalizing active entries 
by evicting, while letting inactive ones expire.

That probably isn't enough and its impossible to cover all use-cases. So 
instead its more of a data structure to (hopefully) be malleable to have custom 
workarounds. The CacheWriter can be used to create a victim cache, which a 
CacheLoader could retrieve from. So you could let expired entries populate the 
victim and be promoted back into the cache, sometimes within the same atomic 
operation. Then a rewarming could clear the victim when its done as its 
contents are unnecessary. So something like this is might be workable.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517090#comment-15517090
 ] 

Ben Manes commented on SOLR-8241:
-

Expiration is tricky because it means the data is no longer valid to be 
consumed and should not be consumed. The middle ground here is to 
refreshAfterWrite, which serves stale entries and tries to asynchronously 
reload the value. That covers the common case by not penalizing active entries 
by evicting, while letting inactive ones expire.

That probably isn't enough and its impossible to cover all use-cases. So 
instead its more of a data structure to (hopefully) be malleable to have custom 
workarounds. The CacheWriter can be used to create a victim cache, which a 
CacheLoader could retrieve from. So you could let expired entries populate the 
victim and be promoted back into the cache, sometimes within the same atomic 
operation. Then a rewarming could clear the victim when its done as its 
contents are unnecessary. So something like this is might be workable.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-08-29 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447579#comment-15447579
 ] 

Ben Manes commented on SOLR-9284:
-

[~michael.sun]: If you upgrade to Caffeine 2.x then it will take 
[advantage|https://github.com/ben-manes/caffeine/wiki/Efficiency] of frequency 
in addition to recency. A path is available in 
[SOLR-8241|https://issues.apache.org/jira/browse/SOLR-8241], but its been 
stalled due to Shawn not having the bandwidth to drive the changes forward.

> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-08-25 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437826#comment-15437826
 ] 

Ben Manes commented on SOLR-9284:
-

Hopefully I didn't break this behavior when upgrading from 
ConcurrentLinkedHashMap (Caffeine's predecessor). That code used an eviction 
listener, so I think it was a direct translation. Can you take a look and see 
if the prior version was more correct?

Note that the cache, in its current form, will only evict after the maximum 
size threshold is crossed. However, Guava does evict prior due to being split 
into multiple segments that are operated on exclusively during a write. I kept 
that wording in the JavaDoc to provide flexibility, just in case.

> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-07-16 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381077#comment-15381077
 ] 

Ben Manes commented on SOLR-8241:
-

Can we try to move this forward again? Thanks!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-12 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238184#comment-15238184
 ] 

Ben Manes commented on SOLR-8241:
-

Thanks for the information. I definitely meant that would be a new issue if we 
were happy with the results here. It makes sense that Lucene wouldn't want 
dependencies and a different expert would be needed to review. As those are 
synchronous I can easily port the code over (its the concurrency that's hard). 
We'll revisit that if we have a positive experience here, as I think this is 
the more critical cache for Solr.

Thanks a lot Shawn for pushing this forward and all your help thus far.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-12 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237765#comment-15237765
 ] 

Ben Manes commented on SOLR-8241:
-

There are some other caches that might be worth migrating as well (e.g. 
[LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java],
 
[LRUHashMap|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/LRUHashMap.java],
 
[NameIntCacheLRU|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/NameIntCacheLRU.java]).
 It might be good to follow-up after this patch and see what other caches 
benefit from being migrated.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8906) Make transient core cache pluggable.

2016-04-01 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222663#comment-15222663
 ] 

Ben Manes commented on SOLR-8906:
-

TinyLFU is scan resistant (see [Glimpse 
trace|https://github.com/ben-manes/caffeine/wiki/Efficiency#glimpse]). For 
implementation details a nice overview is provided in the [HighScalability 
article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html].

> Make transient core cache pluggable.
> 
>
> Key: SOLR-8906
> URL: https://issues.apache.org/jira/browse/SOLR-8906
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> The current Lazy Core stuff is pretty deeply intertwined in CoreContainer. 
> Adding and removing active cores is based on a simple LRU mechanism, but 
> keeping the right cores in the right internal structures involves a lot of 
> attention to locking various objects to update internal structures. This 
> makes it difficult/dangerous to use any other caching algorithms.
> Any single age-out algorithm will have non-optimal access patterns, so making 
> this pluggable would allow better algorithms to be substituted in those cases.
> If we ever extend transient cores to SolrCloud, we need to have load/unload 
> decisions that are cloud-aware rather then entirely local so in that sense 
> this is would lay some groundwork if we ever want to go there.
> So I'm going to try to hack together a PoC. Any ideas on the most sensible 
> pattern for this gratefully received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8906) Make transient core cache pluggable.

2016-04-01 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222638#comment-15222638
 ] 

Ben Manes commented on SOLR-8906:
-

Not sure if it helps, but there's discussion of using TinyLFU instead of LRU / 
LFU for the SolrCache 
([SOLR-8241|https://issues.apache.org/jira/browse/SOLR-8241]). That library 
could be used instead of LRU here too to evict based on recency and frequency. 
From my reading of 
[transientCores|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCores.java#L76]
 that appears to be a simple migration. 

> Make transient core cache pluggable.
> 
>
> Key: SOLR-8906
> URL: https://issues.apache.org/jira/browse/SOLR-8906
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> The current Lazy Core stuff is pretty deeply intertwined in CoreContainer. 
> Adding and removing active cores is based on a simple LRU mechanism, but 
> keeping the right cores in the right internal structures involves a lot of 
> attention to locking various objects to update internal structures. This 
> makes it difficult/dangerous to use any other caching algorithms.
> Any single age-out algorithm will have non-optimal access patterns, so making 
> this pluggable would allow better algorithms to be substituted in those cases.
> If we ever extend transient cores to SolrCloud, we need to have load/unload 
> decisions that are cloud-aware rather then entirely local so in that sense 
> this is would lay some groundwork if we ever want to go there.
> So I'm going to try to hack together a PoC. Any ideas on the most sensible 
> pattern for this gratefully received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178872#comment-15178872
 ] 

Ben Manes commented on SOLR-8241:
-

Percentile stats are best obtained by the metrics library. The stats provided 
by Caffeine are monotonically increasing over the lifetime of the cache. This 
lets the percentiles over a time window be easily calculated by the metrics 
reporter.

The only native time statistic is the load time (cost of computing the entry on 
a miss) because it adds to the user-facing latency. All cache operations are 
O(1) and designed for concurrency, so broadly tracking time would be 
prohibitively expensive given how slow the native time methods are. From 
benchmarks I think the cache offers enough headroom to not be a bottleneck, so 
tracking the hit rate and minimizing the miss penalty are probably the more 
interesting areas to monitor.

I'm not sure what my next steps are to assist here, so let me know if I can be 
of further help.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178389#comment-15178389
 ] 

Ben Manes commented on SOLR-8241:
-

Using the metrics library should be really easy. There are two simple 
implementation approaches,

1. Use the same approach as [Guava 
metrics|http://antrix.net/posts/2014/codahale-metrics-guava-cache] that polls 
the cache's stats. Caffeine is the next gen, so it has a nearly identical API.
2. Use a custom 
[StatsCounter|http://static.javadoc.io/com.github.ben-manes.caffeine/caffeine/2.2.2/com/github/benmanes/caffeine/cache/stats/StatsCounter.html]
 and {{Caffeine.recordStats(statsCounter)}} that records directly into the 
metrics. This rejected feature 
[request|https://github.com/google/guava/issues/2209#issuecomment-153290342] 
shows an example of that, though I'd return a {{disabledStatsCounter()}} 
instead of throwing an exception if polled.

The only annoyance is neither Guava or Caffeine bothered to include a {{put}} 
statistic. That was partially an oversight and partially because we really 
wanted everyone to load through the cache (put is often an anti-pattern due to 
races). I forgot to add it in with v2 and due to being an API change semvar 
would require that it be in v3 or maybe we can use a [default 
method|https://blog.idrsolutions.com/2015/01/java-8-default-methods-explained-5-minutes/]
 hack for sneaking it into v2.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178106#comment-15178106
 ] 

Ben Manes commented on SOLR-8241:
-

I see that [YCSB|https://github.com/brianfrankcooper/YCSB] includes Solr as a 
backend. It is a popular benchmark, though is oriented for comparing key-value 
queries. Still, that might be an easy way to see the performance and cache 
efficiency impact of this proposal.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Attachment: SOLR-8241.patch

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136695#comment-15136695
 ] 

Ben Manes commented on SOLR-8241:
-

Attached a patch that includes a new SolrCache implementation based on Caffeine 
(version 2.1.0). This was based on the LruCache, trimmed extensively to match 
the requirements in the SolrCaching wiki page.

This passes the "ant precommit" check, but due to a lack of familiarity with 
Solr I didn't run the server to test it. Due to the simplicity of the change I 
think this should be a relatively good prototype to start from. Hopefully there 
isn't much work required to complete this task and see if the cache is 
beneficial. Based on my limited understanding of Solr's existing caches, I 
expect this new one to be both faster and have a higher hit rate.

Shawn, can you please take a look? Thanks!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137321#comment-15137321
 ] 

Ben Manes commented on SOLR-8241:
-

I only used the LruCache as a template and removed much of it, though looking 
at LfuCache it might have been easier to work with, since mine was trimmed to 
look very similar. 

LFU is substantially better than LRU for search engine workloads like Solr's. I 
do not have any Solr specific metrics to offer, but the search engine traces I 
do have are very promising. LFU is superior to LRU, and TinyLFU is a 
substantial further improvement. If the impact was not so significant then I 
would not be advocating this change.

WebSearch1 @ 4M (U. of Mass.)
* Lru: 21.6%
* Lfu: 27.1%
* W-TinyLfu: 41.5%
* Optimal: 57.8%

S3 @ 400K (ARC paper)
* Lru: 12.0%
* Lfu: 29.4%
* W-TinyLfu: 43.0%
* Optimal: 60.0%

Yes, this is Java 8 only. The interface of RemovalListener was changed from 
v1.x to v2.x in order to be friendlier lambda syntax for the builder's type 
inference.

Please read this [short 
article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]
 which describes the policy and concurrency mechanism. That should provide you 
enough context to judge this change without taking a deep dive into the 
library's implementation. The patch to Solr's code is quite small.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2015-12-22 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069335#comment-15069335
 ] 

Ben Manes commented on SOLR-8241:
-

[Benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks] of Caffeine 
shows that the cache is ~33% as fast as an unbounded ConcurrentHashMap. As an 
earlier version is already a dependency, for a proof-of-concept the easiest 
would be to use an adapter into a Solr 
[Cache|https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/Cache.java].
 If the results are attractive, the next decision can be whether to use 
Caffeine or incorporate its ideas into a Solr cache instead.

LRU and LFU only retain information of the current working set. That turns out 
to be a limitation and by capturing more history a significantly better 
prediction (and hit rate) can be achieved. How that history is stored and used 
is how many newer polices differ (ARC, LIRS, 2Q, etc). Regardless they 
outperform a LRU / LFU by sometimes very wide margins, which makes choosing one 
very attractive. In the case of TinyLFU its very easy to adapt onto an existing 
policy as it works by filtering (admission) rather than organizing the order of 
exiting (eviction).

The [paper|http://arxiv.org/pdf/1512.00727.pdf] is a bit long, but a good read. 
The simulation code is very simple, though Caffeine's version isn't due to 
tackling the concurrency aspect as well.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8241) Evaluate W-TinyLfu cache

2015-11-05 Thread Ben Manes (JIRA)

Ben Manes created SOLR-8241:
---

 Summary: Evaluate W-TinyLfu cache
 Key: SOLR-8241
 URL: https://issues.apache.org/jira/browse/SOLR-8241
 Project: Solr
  Issue Type: Wish
  Components: search
Reporter: Ben Manes
Priority: Minor


SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). The 
discussions seem to indicate that the higher hit rate is offset by the slower 
performance of the implementation. An original goal appeared to be to introduce 
ARC, a patented algorithm that uses ghost entries to retain history information.

My analysis of Window TinyLfu indicates that it may be a better option. It uses 
a frequency sketch to compactly estimate an entry's popularity. It uses LRU to 
capture recency and operate in O(1) time. When using available academic traces 
the policy provides a near optimal hit rate regardless of the workload.

I'm getting ready to release the policy in Caffeine, which Solr already has a 
dependency on. But, the code is fairly straightforward and a port into Solr's 
caches instead is a pragmatic alternative. More interesting is what the impact 
would be in Solr's workloads and feedback on the policy's design.

https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2015-11-05 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-8241:

Description: 
SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). The 
discussions seem to indicate that the higher hit rate (vs LRU) is offset by the 
slower performance of the implementation. An original goal appeared to be to 
introduce ARC, a patented algorithm that uses ghost entries to retain history 
information.

My analysis of Window TinyLfu indicates that it may be a better option. It uses 
a frequency sketch to compactly estimate an entry's popularity. It uses LRU to 
capture recency and operate in O(1) time. When using available academic traces 
the policy provides a near optimal hit rate regardless of the workload.

I'm getting ready to release the policy in Caffeine, which Solr already has a 
dependency on. But, the code is fairly straightforward and a port into Solr's 
caches instead is a pragmatic alternative. More interesting is what the impact 
would be in Solr's workloads and feedback on the policy's design.

https://github.com/ben-manes/caffeine/wiki/Efficiency

  was:
SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). The 
discussions seem to indicate that the higher hit rate is offset by the slower 
performance of the implementation. An original goal appeared to be to introduce 
ARC, a patented algorithm that uses ghost entries to retain history information.

My analysis of Window TinyLfu indicates that it may be a better option. It uses 
a frequency sketch to compactly estimate an entry's popularity. It uses LRU to 
capture recency and operate in O(1) time. When using available academic traces 
the policy provides a near optimal hit rate regardless of the workload.

I'm getting ready to release the policy in Caffeine, which Solr already has a 
dependency on. But, the code is fairly straightforward and a port into Solr's 
caches instead is a pragmatic alternative. More interesting is what the impact 
would be in Solr's workloads and feedback on the policy's design.

https://github.com/ben-manes/caffeine/wiki/Efficiency


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-08 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: SOLR-7355.patch

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-08 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: (was: SOLR-7355.patch)

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor

 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485238#comment-14485238
 ] 

Ben Manes commented on SOLR-7355:
-

I saw that too and assumed python3 had to be available, so I brew installed it. 
The precommit was successful for me, but for the patch that I accidentally 
didn't upload.

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch, SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485216#comment-14485216
 ] 

Ben Manes commented on SOLR-7355:
-

Thanks! I thought I fixed that in the last update (accidentally two license 
files), but looking at it I guess I uploaded the stale patch.

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch, SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-08 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485245#comment-14485245
 ] 

Ben Manes commented on SOLR-7355:
-

Odd that the patch doesn't show that this was moved,
A  +solr/licenses/caffeine-NOTICE.txt
 moved from solr/licenses/concurrentlinkedhashmap-lru-NOTICE.txt

The CLHM was empty, which is acceptable for Caffeine as well.

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch, SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)

Ben Manes created SOLR-7355:
---

 Summary: Java 8: ConcurrentLinkedHashMap - Caffeine
 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor


When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
performance should be relatively the same. The per-instance memory usage should 
be smaller and Solr may decide to opt-in to use some of the additional 
features. The only drawback is that the jar size is larger due to code 
generation, though that may be trimmed over time and usually is not a concern 
for server-side applications.

ConcurrentLinkedHashMap changes will continue to be minimal, even more so now, 
and driven by requests from Java 6 users unable to upgrade. Caffeine is ideally 
the upgrade path for Guava cache users too, which due to Android cannot be 
significantly modified.

Caffeine: https://github.com/ben-manes/caffeine
Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: SOLR-7355.patch

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: (was: SOLR-7355.patch)

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor

 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484300#comment-14484300
]

Ben Manes commented on SOLR-7355:
-

This ticket is not an attempt to advocate a competing project, but rather is a
notification that ConcurrentLinkedHashMap has reached its end-of-life and will
only receive maintenance updates. Similarly Guava's cache is in maintenance
mode, performance improvements are not accepted due to a lack of owner, and
Guava stay JDK6 due to Android. As the author or co-author of all three caching
libraries, this is the upgrade path that I recommend. Solr can instead choose
to stay as is or migrate to an alternative library.

Does Solr have performance tests? The before mentioned benchmarks show that
Caffeine has equivalent performance to CLHM v1.4.2. However, Solr uses the
older v1.2 dependency which typically performs reasonably well under
application load but much worse in a micro-benchmark. That's due to the
original focus on avoiding lock contention (Java 5 synchronizers were slow) and
later resolving GC pressure under synthetic stress tests. The evolutionary
improvements to Caffeine's algorithms mostly reduce memory usage and has only a
subtle impact on the performance profile.

Caffeine executes over 1.5 million unit tests and the changes pass Solr's test
suite. Any latent bugs will be addressed upon discovery.

Java 8: ConcurrentLinkedHashMap - Caffeine
---

Key: SOLR-7355
URL: https://issues.apache.org/jira/browse/SOLR-7355
Project: Solr
Issue Type: Task
Reporter: Ben Manes
Priority: Minor
Attachments: SOLR-7355.patch

When Solr transitions to requiring Java 8, please upgrade to Caffeine. The
performance should be relatively the same. The per-instance memory usage
should be smaller and Solr may decide to opt-in to use some of the additional
features. The only drawback is that the jar size is larger due to code
generation, though that may be trimmed over time and usually is not a concern
for server-side applications.
ConcurrentLinkedHashMap changes will continue to be minimal, even more so
now, and driven by requests from Java 6 users unable to upgrade. Caffeine is
ideally the upgrade path for Guava cache users too, which due to Android
cannot be significantly modified.
Caffeine: https://github.com/ben-manes/caffeine
Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484418#comment-14484418
]

Ben Manes commented on SOLR-7355:
-

* License remains Apache 2.
* Performance should be the same or higher.
* A grep for Google's shows it being used primarily for memoization.
* Solr appears to use many different caching implementations so its hard, as an
outsider, to discern which are performance critical.

Java 8: ConcurrentLinkedHashMap - Caffeine
---

Key: SOLR-7355
URL: https://issues.apache.org/jira/browse/SOLR-7355
Project: Solr
Issue Type: Task
Reporter: Ben Manes
Priority: Minor
Attachments: SOLR-7355.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: SOLR-7355.patch

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: SOLR-7355.patch

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor
 Attachments: SOLR-7355.patch


 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7355) Java 8: ConcurrentLinkedHashMap - Caffeine

2015-04-07 Thread Ben Manes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated SOLR-7355:

Attachment: (was: SOLR-7355.patch)

 Java 8: ConcurrentLinkedHashMap - Caffeine
 ---

 Key: SOLR-7355
 URL: https://issues.apache.org/jira/browse/SOLR-7355
 Project: Solr
  Issue Type: Task
Reporter: Ben Manes
Priority: Minor

 When Solr transitions to requiring Java 8, please upgrade to Caffeine. The 
 performance should be relatively the same. The per-instance memory usage 
 should be smaller and Solr may decide to opt-in to use some of the additional 
 features. The only drawback is that the jar size is larger due to code 
 generation, though that may be trimmed over time and usually is not a concern 
 for server-side applications.
 ConcurrentLinkedHashMap changes will continue to be minimal, even more so 
 now, and driven by requests from Java 6 users unable to upgrade. Caffeine is 
 ideally the upgrade path for Guava cache users too, which due to Android 
 cannot be significantly modified.
 Caffeine: https://github.com/ben-manes/caffeine
 Benchmarks: https://github.com/ben-manes/caffeine/wiki/Benchmarks
 ConcurrentLinkedHashMap: https://code.google.com/p/concurrentlinkedhashmap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1082) Refactor caching layer to be JCache compliant (jsr-107). In particular, consider using ehcache implementation

2009-03-25 Thread Ben Manes (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689359#action_12689359
]

Ben Manes commented on SOLR-1082:
-

Yes, the LRU implementation in my CLHM is less than ideal, as I had originally
intended to use a back-tracking algorithm but I just didn't trust it. This is
why I use the Second Chance policy in our production environment, as it
provides LRU-like efficiency without any bad tricks. I consider the watermark
approach taken by ConcurrentLRUCache to be an equally bad trick, because it
plays Russian rollet for which caller takes the hit. So I was pounding my head
trying to figure out how to do it elegantly, experimenting along the way, and
hence the warning in the JavaDoc.

If you look at the google project page for the CLHM (see JavaDoc) you'll see
that I've posted a document with a fairly nice design that should resolve your
concerns. There are a few optimizations that can be made that I should update
the document with, and I am toying with a rework that may prove to be lock
free. Of course finding the actual bandwidth to implement the algorithm has
been challenging. But its a hobby project, or at least that's my excuse! :-)

Since Greg Luck found enough value in the current, though flawed,
implementation he adopted it. Just like I'm sure that Solr found enough value
in the ConcurrentLRUCache - neither design is perfect, but good enough for now.
Hopefully I'll find some time shortly to continue working on my project and
Ehcache can adopt a better version in a later release.

Refactor caching layer to be JCache compliant (jsr-107). In particular,
consider using ehcache implementation
-

Key: SOLR-1082
URL: https://issues.apache.org/jira/browse/SOLR-1082
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 1.5
Reporter: Kaktu Chakarabati

overhaul the caching layer to be compliant
with the upcoming Jcache api (jsr-107).
In specific, I've been experimenting some with ehcache
(http://ehcache.sourceforge.net/ , Apache OS license) and it seems to be a
very comprehensive implementation, as well as fully compliant with the jcache
API.
I think the benefits are numerous: in respect to ehcache itself, it seems to
be a very mature implementation, supporting most classical cache schemes as
well as some interesting distributed cache options (and of course,
performance-wise its very lucrative in terms of reported multi-cpu scaling
performance and some of the benchmark figures they show).
Further, abstracting away the caches to use the jcache api would probably
make it easier in the future to make the whole caching layer more easily
swappable with some other implementations that will probably crop up.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

65 matches

Mail list logo