[jira] [Commented] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-08 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902487#comment-15902487
 ] 

Ben Manes commented on SOLR-10205:
--

For writes you might prefer to use an atomic computation instead of a racy 
get-compute-put. The stampeding writers will cause a storm of removal 
notifications indicating the value was replaced. I think that would result in 
more frequently needing to free and acquire slots in the bank. This would 
reduce I/O costs as well, of course. Caffeine performs this by using a 
lock-free lookup that falls back to a computeIfAbsent, so that a hit won't 
thrash on locks if the entry is present.

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 6.5, master (7.0)
>
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898670#comment-15898670
 ] 

ASF subversion and git services commented on SOLR-10205:


Commit f2da342c47f8588996c7a68433a4e11131e46ee2 in lucene-solr's branch 
refs/heads/branch_6x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f2da342 ]

SOLR-10205: BlockCache - use 4 reserved blocks, don't use executor in caffeine, 
call cleanUp


> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894229#comment-15894229
 ] 

Yonik Seeley commented on SOLR-10205:
-

Here are the results of some random test runs.

Each test run works by:
 # Building a random index
 # Running multiple query iterations.  Each iteration looks like:
## create a random list of queries
## shuffle a list of test configurations (in this case, different 
BlockCache configurations)
## run the set of queries with multiple query threads
## the results (like the time for the set of queries to run) is aggregated 
for each configuration across multiple iterations

Since these are random queries against a random index, results vary widely.
The number of iterations also varies widely (because sometimes running a single 
set of queries against a single configuration just once takes 5-10min).  The 
test times out after about 2 hrs.

The configurations for these first tests varied two parameters:
- when trying to find a free block, the number of times per loop we tried to 
call caffeine.cleanUp() if the map size >= maxEntries
- whether to use an executor for map cleanup (the default) or specify 
Runnable::run (current thread is used for any submitted tasks)

Some selected results:
{code}
55.7/47.2 = 18%
TEST=( time=532066 storeFails=384058 misses=4212598 executor=true 
cleanupTries=3 reserved=1) 
{size=1026,lookups=1216192323,hits=1166321317,evictions=39359922,storeFails=3800330,hitratio_current=0.96,lookups_persec=184208.98,hits_persec=176268.53,evictions_persec=6116.551,storeFails_persec=752.1843,time_delta=10.093270267,buffercache.allocations=0.0,buffercache.lost=0.0}
   [junit4]   2> iter=50 RESULTS:
   [junit4]   2>( time=557072 storeFails=1473124 misses=5151913 
executor=true cleanupTries=0 reserved=1)
   [junit4]   2>( time=482894 storeFails=257414 misses=4090859 
executor=false cleanupTries=0 reserved=1)
   [junit4]   2>( time=536910 storeFails=460293 misses=4275501 
executor=true cleanupTries=1 reserved=1)
   [junit4]   2>( time=478086 storeFails=51078 misses=3934459 
executor=false cleanupTries=1 reserved=1)
   [junit4]   2>( time=529849 storeFails=418456 misses=4238267 
executor=true cleanupTries=2 reserved=1)
   [junit4]   2>( time=471511 storeFails=31507 misses=3918859 
executor=false cleanupTries=2 reserved=1)
   [junit4]   2>( time=532066 storeFails=384058 misses=4212598 
executor=true cleanupTries=3 reserved=1)
   [junit4]   2>( time=476136 storeFails=16950 misses=3900285 
executor=false cleanupTries=3 reserved=1)
   [junit4]   2>( time=532881 storeFails=358717 misses=4188116 
executor=true cleanupTries=4 reserved=1)
   [junit4]   2>( time=474018 storeFails=12556 misses=3893874 
executor=false cleanupTries=4 reserved=1)
   [junit4]   2>( time=528446 storeFails=324929 misses=4162481 
executor=true cleanupTries=5 reserved=1)
   [junit4]   2>( time=476876 storeFails=11248 misses=3901980 
executor=false cleanupTries=5 reserved=1)

32threads 65.7/55.8 = 17%
   [junit4]   2> ## Testing took 53.464 seconds. Total queries=1096 Total 
matches=124437449
   [junit4]   2> TEST=( time=657474 storeFails=9079465 misses=20447634 
executor=true cleanupTries=0 reserved=1) {size=1023,lookups=74615842
8,hits=537166437,evictions=163933036,storeFails=21041927,hitratio_current=0.67,lookups_persec=96833.06,hits_persec=64891.32,evictions_perse
c=15433.991,storeFails_persec=14261.533,time_delta=53.463745281,buffercache.allocations=0.0,buffercache.lost=0.0}
   [junit4]   2> iter=11 RESULTS:
   [junit4]   2>( time=657474 storeFails=9079465 misses=20447634 
executor=true cleanupTries=0 reserved=1)
   [junit4]   2>( time=571034 storeFails=2082850 misses=17633145 
executor=false cleanupTries=0 reserved=1)
   [junit4]   2>( time=617555 storeFails=2720476 misses=17615573 
executor=true cleanupTries=1 reserved=1)
   [junit4]   2>( time=558536 storeFails=316721 misses=16978923 
executor=false cleanupTries=1 reserved=1)
   [junit4]   2>( time=615668 storeFails=2258180 misses=17440287 
executor=true cleanupTries=2 reserved=1)
   [junit4]   2>( time=560400 storeFails=75888 misses=16881946 
executor=false cleanupTries=2 reserved=1)
   [junit4]   2>( time=612233 storeFails=1819890 misses=17284381 
executor=true cleanupTries=3 reserved=1)
   [junit4]   2>( time=556029 storeFails=21173 misses=16840613 
executor=false cleanupTries=3 reserved=1)
   [junit4]   2>( time=613037 storeFails=1476039 misses=17117779 
executor=true cleanupTries=4 reserved=1)
   [junit4]   2>( time=556052 storeFails=10474 misses=16881301 
executor=false cleanupTries=4 reserved=1)
   [junit4]   2>( time=611542 storeFails=1172765 misses=17019642 
executor=true cleanupTries=5 reserved=1)
   [junit4]   2>( time=554806 

[jira] [Commented] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891600#comment-15891600
 ] 

Yonik Seeley commented on SOLR-10205:
-

I also plan on trying out using a higher number of reserved blocks (like 2 or 
3) instead of the current 1.  This helps because if 2 threads both try to cache 
blocks at the same time, one will grab the reserved block first, then the other 
will have to wait until an older entry is evicted from the map (caused by the 
fact that the first thread will insert a new entry).

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org