[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: (was: SOLR-10205.patch)

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: SOLR-10205.patch

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: SOLR-10205.patch

Final patch - I plan on committing shortly.

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: cache_performance_test.txt

Here's the results of testing with different numbers of reserved blocks (up to 
4) and different number of calls to cleanUp when the map size exceeds the 
number of blocks - reserved.

The speedups compared to trunk range from 11% to 68% for these artificial 
random tests.

Based on the results, I think the right balance is going with reserved blocks = 
4 and a single call to cleanUp in the outer loop of 
BlockCache.findEmptyLocation()

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: cache_performance_test.txt, SOLR-10205.patch, 
> SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: SOLR-10205.patch

OK, here's an update to the tests I'm using. Some changes:
- BlockCache now only tries calling cleanUp the next time through the loop 
(i.e. needs to fail to find a block first)
- only call cleanUp if map size >= number_of_blocks (as opposed to max 
configured size of the map... this makes a difference when reserved_blocks>1)
- test estimates number of queries to use in the next test round based on time 
so far
- test exits by itself after a given amount of time (1hr is what I'm testing 
with)

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: SOLR-10205.patch, SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures

2017-03-01 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-10205:

Attachment: SOLR-10205.patch

Here's a snapshot of the modifications and tests I'm using to reduce store 
failures and evaluate the potential impact.

It's messy... yet another hacked up version of the HDFS performance test I used 
to uncover the blockcache corruption issues.   I don't plan on committing most 
of this.  It's just for evaluating what should be committed to reduce store 
failures.

> Evaluate and reduce BlockCache store failures
> -
>
> Key: SOLR-10205
> URL: https://issues.apache.org/jira/browse/SOLR-10205
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Attachments: SOLR-10205.patch
>
>
> The BlockCache is written such that requests to cache a block 
> (BlockCache.store call) can fail, making caching less effective.  We should 
> evaluate the impact of this storage failure and potentially reduce the number 
> of storage failures.
> The implementation reserves a single block of memory.  In store, a block of 
> memory is allocated, and then a pointer is inserted into the underling map.  
> A block is only freed when the underlying map evicts the map entry.
> This means that when two store() operations are called concurrently (even 
> under low load), one can fail.  This is made worse by the fact that 
> concurrent maps typically tend to amortize the cost of eviction over many 
> keys (i.e. the actual size of the map can grow beyond the configured maximum 
> number of entries... both the older ConcurrentLinkedHashMap and newer 
> Caffeine do this).  When this is the case, store() won't be able to find a 
> free block of memory, even if there aren't any other concurrently operating 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org