[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876872#comment-15876872
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit d8799bc475ca5d384ec49ecf2726aec58e37447b in lucene-solr's branch 
refs/heads/branch_6x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d8799bc ]

SOLR-10141: Upgrade to Caffeine 2.4.0 to fix issues with removal listener


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876847#comment-15876847
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit e9e02a2313518682690ca2933efd0b4db0b54b7c in lucene-solr's branch 
refs/heads/master from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9e02a2 ]

SOLR-10141: Upgrade to Caffeine 2.4.0 to fix issues with removal listener


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873511#comment-15873511
 ] 

Ben Manes commented on SOLR-10141:
--

Released 2.4.0

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873361#comment-15873361
 ] 

Ben Manes commented on SOLR-10141:
--

That makes sense. If its a fallback when an empty slot can't be acquired, it 
may be preferable to calling cleanUp() always. But a stress test would be 
necessary to verify that, as the spin time might be too small so that it didn't 
help.

In most traces frequency dominates over recency, so most insertions are 
pollutants. The impact of a failed insertion might not have had a negative 
result, as a popular item would make its way in. Then the failing one-hit 
wonders wouldn't have disrupted the LRU as much. That's less meaningful with 
Caffeine, since we switched to TinyLFU.

As an aside, I'd appreciate help in moving SOLR-8241 forward. Its been approved 
but backlogged as the committer has not had the time to actively participate in 
Solr. But if that's crossing territories or you feel uncomfortable due to this 
bug, I understand.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873357#comment-15873357
 ] 

Yonik Seeley commented on SOLR-10141:
-

The size issue is only an issue for the BlockCache specifically (not for any 
other Solr caches).
Actually, the way the BlockCache is written, we are guaranteed to never have 
more than maxEntries... writers have to wait for an open slot (which opens up 
once the removal listener is called).  The writer spins a bit trying to find an 
open slot and fails if it can't.  Doing extra work via cache.cleanUp() if we 
don't see an empty slot is definitely better than failing to cache the entry.

I imagine the issue existed when CLHM was used as well.  The metric of store 
failures isn't currently tracked, and it only leads to a lower cache hit rate.  
I plan on starting to track it, and then to see how often it happens when we're 
actually caching real HDFS blocks.  That's a separate issue though.


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873334#comment-15873334
 ] 

Ben Manes commented on SOLR-10141:
--

If you wish to ensure a very strict bounding by throttling writers, that would 
do the job. I'm not sure if its needed except in your tests, as in practice the 
assumption is its cleaned up in a timely enough manner.

The cache uses a bounded write buffer to provide some slack, minimize the 
response latencies for writers, and defers the cleanup to the executor 
(scheduled as immediate). This allows the cache to temporarily exceed the high 
water mark, but catch up quickly. In general a high write rate on a cache is 
actually 2-3 inserts/sec, there's memory headroom for GC, and the server isn't 
cpu bounded. If instead we ensured a strict bound then we'd need a global lock 
to throttle writers on which limits concurrency. So its a trade-off that works 
for most usages.

CLHM uses the same design, so I wonder if only your tests are affected but it 
is okay in practice. CLHM uses an unbounded write buffer, whereas in Caffeine 
its bounded to provide some back pressure if full. Being full is very rare, so 
this is mostly to replace linked lists with a growable ring buffer. The slack 
is probably excessive as I didn't have a good sizing parameter (max ~= 128 x 
ncpu). The cleanUp() call forces the caller to block and do the maintenance 
itself, rather than relying on the async processing (which may be in-flight or 
triggered on a subsequent operation). You can get a sense of this write-ahead 
log design from this [slide 
deck|https://docs.google.com/presentation/d/1NlDxyXsUG1qlVHMl4vsUUBQfAJ2c2NsFPNPr2qymIBs].

I'm not sure what, or if, I can do anything regarding your size concern. But 
I'll wait for releasing 2.4 until you're satisfied that we've resolved all the 
issues.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873210#comment-15873210
 ] 

Yonik Seeley commented on SOLR-10141:
-

Thanks Ben, I confirmed that this fixes the removalListener issue.

As far as the cache size issue, I've found that calling cache.cleanUp() after a 
put() seems to keep things under control.  Is there any other method I should 
look at?
{code}
if (cache.estimatedSize() > maxEntries) {
  // BlockCache *really* relies on having enough removalListeners 
called to get back down to the configured maxEntries (otherwise the
  // underlying direct memory will be exhausted and the 
BlockCache.store will have to fail).
  cache.cleanUp();
}
{code}


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873011#comment-15873011
 ] 

Ben Manes commented on SOLR-10141:
--

[Pull Request|https://github.com/ben-manes/caffeine/pull/144] with the fix and 
your test case.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872983#comment-15872983
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The 
[eviction 
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
 performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
   - resurrected if a race occurred (e.g. was thought expired, but newly 
accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because 
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
 can perform its update outside of a hash table lock (e.g. a computation). It 
synchronizes on the entry to update, checking first if it was still alive. This 
resulted in a race where the entry was removed from the hash table, the value 
updated, and entry marked as dead. When the listener was notified, it received 
the wrong value.

The solution I have now is to expand the synchronized block on eviction. This 
passes your test and should be cheap. I'd like to review it a little more and 
incorporate your test into my suite.

This is an excellent find. I've stared at the code many times and the race 
seems obvious in hindsight.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872969#comment-15872969
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks! I'm resolving some issues with the latest error-prone (static analyzer) 
and dig into it.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872965#comment-15872965
 ] 

Yonik Seeley commented on SOLR-10141:
-

I checked in the test (test method testCacheConcurrent) : 
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/test/org/apache/solr/store/blockcache/BlockCacheTest.java


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872943#comment-15872943
 ] 

Ben Manes commented on SOLR-10141:
--

Can you provide me with the latest version of a self-contained test? If I can 
reproduce and debug it, I'll have a fix over the weekend.

v2 introduced a new eviction policy to take into account the frequency. The 
eviction should be rapid, so these issues remaining are surprising. I've tried 
to be diligent about testing, so will investigate.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872937#comment-15872937
 ] 

Yonik Seeley commented on SOLR-10141:
-

Well darn... it looked like things were fixed by the upgrade to 2.3.5, but then 
I looked a little closer.
I happened to notice that the hit rate was super high, when I designed the test 
to be closer to 50% (maxEntries = maxBlocks/2)

When I set these parameters in the test:
{code}
final int readLastBlockOdds=0; // odds (1 in N) of the next block operation 
being on the same block as the previous operation... helps flush concurrency 
issues
final boolean updateAnyway = false; // sometimes insert a new entry for the 
key even if one was found
{code}

Results in something like this:
{code}
Done! # of Elements = 200 inserts=17234 removals=17034 hits=9982766 
maxObservedSize=401
{code}

So for 10M multi-threaded reads, our hit rate was 99.8%, which artificially 
lowers the rate at which we insert new entries, and hence doesn't exercise the 
concurrency as well, leading to a passing test most of the time.

When I modified the test to increase the write concurrency again, accounting 
for a cache that is apparently too big:
{code}
final int readLastBlockOdds=10; // odds (1 in N) of the next block 
operation being on the same block as the previous operation... helps flush 
concurrency issues
final boolean updateAnyway = true; // sometimes insert a new entry for the 
key even if one was found
{code}
The removal listener issues reappear:
{code}
WARNING: Exception thrown by removal listener
java.lang.RuntimeException: listener called more than once! k=103 
v=org.apache.solr.store.blockcache.BlockCacheTest$Val@49dbc210 removalCause=SIZE
at 
org.apache.solr.store.blockcache.BlockCacheTest.lambda$testCacheConcurrent$0(BlockCacheTest.java:250)
at 
org.apache.solr.store.blockcache.BlockCacheTest$$Lambda$5/498475569.onRemoval(Unknown
 Source)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$notifyRemoval$1(BoundedLocalCache.java:286)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$12/1297599052.run(Unknown
 Source)
at 
org.apache.solr.store.blockcache.BlockCacheTest$$Lambda$7/957914685.execute(Unknown
 Source)
{code}
Guarding against the removal listener being called more than once with the same 
entry also doesn't seem to work (same as before) since it then becomes apparent 
that some entries never get passed to the removal listener.

Even if the removal listener issues are fixed, the fact that the cache can be 
bigger than the configured size is a problem for us.  The map itself is not 
storing the data, only controlling access to direct memory, so timely removal 
(and a timely call to the removal listener) under heavy concurrency is 
critical.  Without that, the cache will cease to function as a LRU cache under 
load because we won't be able to find a free block int he direct memory to 
actually use.

Even with only 2 threads, I see the cache going to at least double the 
configured maxEntries.  Is there a way to configure the size checking to be 
more strict?

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872906#comment-15872906
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit d810edf5e900bef32b10928d275a02c093d359b6 in lucene-solr's branch 
refs/heads/branch_6x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d810edf ]

SOLR-10141: add test for underlying cache


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872905#comment-15872905
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit 33e398c02115c57ea54bda5f6f612f1b06c1e771 in lucene-solr's branch 
refs/heads/master from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=33e398c ]

SOLR-10141: add test for underlying cache


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872344#comment-15872344
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit be61c6634872435614ea4d59fd14df3426398116 in lucene-solr's branch 
refs/heads/branch_6x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=be61c66 ]

SOLR-10141: Upgrade to Caffeine 2.3.5 to fix issues with removal listener


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872221#comment-15872221
 ] 

Ben Manes commented on SOLR-10141:
--

Thanks [~ysee...@gmail.com]. Sorry about any frustrations this caused.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872211#comment-15872211
 ] 

ASF subversion and git services commented on SOLR-10141:


Commit 6804f3694210ac34728dd6f1a74736681dae2837 in lucene-solr's branch 
refs/heads/master from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6804f36 ]

SOLR-10141: Upgrade to Caffeine 2.3.5 to fix issues with removal listener


> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868205#comment-15868205
 ] 

Ben Manes commented on SOLR-10141:
--

Running your test against master and it doesn't fail. Can you please try 
Caffeine 2.3.5? The only change needed is that the RemovalListener is now 
lambda friendly.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868168#comment-15868168
 ] 

Ben Manes commented on SOLR-10141:
--

Oh, also older jdk8 versions had a bug in fjp causing it to drop tasks. That's 
also a possibility at play.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868161#comment-15868161
 ] 

Ben Manes commented on SOLR-10141:
--

I plan on porting the test to Caffeine's suite and checking against 2.x. Just 
waiting for my train to start.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868132#comment-15868132
 ] 

Yonik Seeley commented on SOLR-10141:
-

Adding a guard in the test code is easy enough (just check if "live" has 
already been set to false), but that then causes an additional problem: a 
memory leak since size() != (adds-removes) at the end (i.e. the removal 
listener is not called for all items).

It looks like the removal listener is called the correct number of times, but 
not always with the correct value.  My guess is that it's somehow related to 
concurrent use of equal keys with different values.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868110#comment-15868110
 ] 

Ben Manes commented on SOLR-10141:
--

It may be FJP retrying a task if it is slow to complete. If so, we might need 
to put a guard to ignore multiple attempts. I can help when you have a test 
case to investigate with.

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10141) Caffeine cache causes BlockCache corruption

2017-02-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868099#comment-15868099
 ] 

Yonik Seeley commented on SOLR-10141:
-

OK, so I finally tracked down the corruption failures with Caffeine to the 
removal listener being called more than once with the same value.
The first time, the underlying block is released and then presumably reused for 
a different key.  The next time (which should never happen), the underlying 
block is unlocked again and can hence be reused by an additional key and we get 
into a situation where multiple "live" keys point to the same underlying memory 
block (and corruption results).

> Caffeine cache causes BlockCache corruption 
> 
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org