[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667956#comment-15667956 ] Ishan Chattopadhyaya edited comment on SOLR-9506 at 11/16/16 1:52 PM: -- Can we resolve this issue, since it seems it was released as part of 6.3.0? (-I will open another issue for the issue I wrote about two comments before- Added SOLR-9777). was (Author: ichattopadhyaya): Can we resolve this issue, since it seems it was released as part of 6.3.0? (I will open another issue for the issue I wrote about two comments before). > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, > SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585683#comment-15585683 ] Yonik Seeley edited comment on SOLR-9506 at 10/18/16 3:02 PM: -- bq. "Right... the core cache key does not change, even if there are deletes for the segment." So the cache key ignores deleted documents, while the value being cached does not. It's a fundamental mis-match. was (Author: ysee...@gmail.com): "Right... the core cache key does not change, even if there are deletes for the segment." So the cache key ignores deleted documents, while the value being cached does not. It's a fundamental mis-match. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531954#comment-15531954 ] Noble Paul edited comment on SOLR-9506 at 9/29/16 7:15 AM: --- Few quick points {code:java} // Map is not concurrent, but since computeIfAbsent is idempotent, it should be alright for two threads to compute values for the same key. private final Map> perSegmentFingerprintCache = new WeakHashMap<>(); {code} It's idempotent, but the map has to be thread safe, javadocs say that threadsafety depends on the map implementation We really don't need to keep a cache per version. The reason is, we only give one version number and only the latest segment will have to have to compute anything other than the full fingerprint. As soon as a new segment is added everything else other than the full fingerprint becomes useless. So, the solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute everytime. So, the cache should be {code:java} private final Map perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>()); {code} was (Author: noble.paul): Few quick points {code:java} // Map is not concurrent, but since computeIfAbsent is idempotent, it should be alright for two threads to compute values for the same key. private final Map > perSegmentFingerprintCache = new WeakHashMap<>(); {code} The map has to be thread safe, javadocs say that threadsafety depends on the map implementation We really don't need to keep a cache per version. The reason is, we only give one version number and only the latest segment will have to have to compute anything other than the full fingerprint. As soon as a new segment is added everything else other than the full fingerprint becomes useless. So, the solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute everytime. So, the cache should be {code:java} private final Map perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>()); {code} > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523490#comment-15523490 ] Noble Paul edited comment on SOLR-9506 at 9/26/16 4:15 PM: --- no. segments don't change. you mean fingerprints can change? was (Author: noble.paul): no. segments don't change > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130 ] Pushkar Raste edited comment on SOLR-9506 at 9/25/16 5:14 PM: -- POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute {{versionsInHash}} from {{versionsInHash}} of individual segments. We can not use current {{versionsHash}} (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and versionsHash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative {{versionHash}} of leader and replica would match. \\ \\Even if decide not to cache {{IndexFingerprint}} per segment but just to parallalize the computation, I think we still would run into issue mentioned above. * I still need to figure out how to keep cache in {{DefaultSolrCoreState}}, so that we can reuse {{IndexFingerprint}} of individual segments when a new Searcher is opened. was (Author: praste): POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute `versionsInHash` from `versionsInHash` of individual segments. We can not use current `versionsHash` (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and hash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative `versionHash` of leader and replica would match * I still need to figure out how to keep cache in `DefaultSolrCoreState`, so that we can reuse `IndexFingerprint` of individual segments when a new Searcher is opened. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org