[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-16 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667956#comment-15667956
 ] 

Ishan Chattopadhyaya edited comment on SOLR-9506 at 11/16/16 1:52 PM:
--

Can we resolve this issue, since it seems it was released as part of 6.3.0? (-I 
will open another issue for the issue I wrote about two comments before- Added 
SOLR-9777).


was (Author: ichattopadhyaya):
Can we resolve this issue, since it seems it was released as part of 6.3.0? (I 
will open another issue for the issue I wrote about two comments before).

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585683#comment-15585683
 ] 

Yonik Seeley edited comment on SOLR-9506 at 10/18/16 3:02 PM:
--

bq. "Right... the core cache key does not change, even if there are deletes for 
the segment."

So the cache key ignores deleted documents, while the value being cached does 
not.  It's a fundamental mis-match.


was (Author: ysee...@gmail.com):
"Right... the core cache key does not change, even if there are deletes for the 
segment."

So the cache key ignores deleted documents, while the value being cached does 
not.  It's a fundamental mis-match.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531954#comment-15531954
 ] 

Noble Paul edited comment on SOLR-9506 at 9/29/16 7:15 AM:
---

Few quick points

{code:java}
  // Map is not concurrent, but since computeIfAbsent is idempotent, it should 
be alright for two threads to compute values for the same key.   
  private final Map> 
perSegmentFingerprintCache = new WeakHashMap<>();
{code}

It's idempotent, but the map has to be thread safe, javadocs say that 
threadsafety depends on the map implementation
We really don't need to keep a cache per version. The reason is, we only give 
one version number and only the latest segment will have to have to compute 
anything other than the full fingerprint. As soon as a new segment is added 
everything else other than the full fingerprint becomes useless. So, the 
solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute 
everytime. So, the cache should be

{code:java}
  private final Map 
perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>());
{code}
 


was (Author: noble.paul):
Few quick points

{code:java}
  // Map is not concurrent, but since computeIfAbsent is idempotent, it should 
be alright for two threads to compute values for the same key.   
  private final Map> 
perSegmentFingerprintCache = new WeakHashMap<>();
{code}

The map has to be thread safe, javadocs say that threadsafety depends on the 
map implementation
We really don't need to keep a cache per version. The reason is, we only give 
one version number and only the latest segment will have to have to compute 
anything other than the full fingerprint. As soon as a new segment is added 
everything else other than the full fingerprint becomes useless. So, the 
solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute 
everytime. So, the cache should be

{code:java}
  private final Map 
perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>());
{code}
 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523490#comment-15523490
 ] 

Noble Paul edited comment on SOLR-9506 at 9/26/16 4:15 PM:
---

no. segments don't change. you mean fingerprints can change?




was (Author: noble.paul):
no. segments don't change


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste edited comment on SOLR-9506 at 9/25/16 5:14 PM:
--

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute {{versionsInHash}} from {{versionsInHash}} of individual 
segments. We can not use current {{versionsHash}} (unless we cache all the 
individual version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and versionsHash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative {{versionHash}} of leader and 
replica would match. 
\\ \\Even if decide not to cache {{IndexFingerprint}} per segment but just to 
parallalize the computation, I think we still would run into issue mentioned 
above.

* I still need to figure out how to keep cache in   {{DefaultSolrCoreState}}, 
so that we can reuse {{IndexFingerprint}} of individual segments when a new 
Searcher is opened.  


was (Author: praste):
POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org