[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667956#comment-15667956 ] Ishan Chattopadhyaya commented on SOLR-9506: Can we resolve this issue, since it seems it was released as part of 6.3.0? (I will open another issue for the issue I wrote about two comments before). > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, > SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667801#comment-15667801 ] Ishan Chattopadhyaya commented on SOLR-9506: I see.. I saw it was unresolved, and I thought it didn't make it into 6.3 yet. I'll see if it made it into 6.3, and open a new ticket if that's the case. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, > SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667795#comment-15667795 ] Noble Paul commented on SOLR-9506: -- Ishan , i guess this is already fixed in 6.3. so, we may need to open another ticket > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, > SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601820#comment-15601820 ] ASF subversion and git services commented on SOLR-9506: --- Commit 265d425b00181dd384fa963e46dc35b92b7e02c0 in lucene-solr's branch refs/heads/branch_6x from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=265d425 ] SOLR-9506: cache IndexFingerprint for each segment > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601675#comment-15601675 ] ASF subversion and git services commented on SOLR-9506: --- Commit 184b0f221559eaed5f273b1907e8af07bc95fec9 in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=184b0f2 ] SOLR-9506: cache IndexFingerprint for each segment > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599826#comment-15599826 ] Pushkar Raste commented on SOLR-9506: - Yeah, I looked into it. I will try that approach, if I can get to it before [~noble.paul] applies the patch. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596148#comment-15596148 ] Pushkar Raste commented on SOLR-9506: - Don't use patch for parallalized computation. Parallel streams in use a shared fork-join pool. A bad actor can create havoc. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589649#comment-15589649 ] Pushkar Raste commented on SOLR-9506: - [~noble.paul] and [~yo...@apache.org] I was able to put together test to show that current implementation is broken. I will update patch with the test and a fix by EOD today > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586530#comment-15586530 ] Keith Laban commented on SOLR-9506: --- How expensive would it be to check numDocs (#4 in yoniks comment earlier). I think this would be the most straightforward and understandable approach. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586419#comment-15586419 ] ASF subversion and git services commented on SOLR-9506: --- Commit ffa5c4ba2c2d6fa6bb943a70196aad0058333fa2 in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffa5c4b ] SOLR-9506: reverting the previous commit > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585918#comment-15585918 ] Yonik Seeley commented on SOLR-9506: Please do. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585801#comment-15585801 ] Noble Paul commented on SOLR-9506: -- If the above case fails, let's revert the commit and revisit the fingerprint computation > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585781#comment-15585781 ] ASF subversion and git services commented on SOLR-9506: --- Commit 9aa764a54f50eca5a8ef805bdb29e4ad90fcce5e in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9aa764a ] * SOLR-9506: cache IndexFingerprint for each segment > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585759#comment-15585759 ] Yonik Seeley commented on SOLR-9506: The above manual test only exhibited this bad behavior after the commit today. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585756#comment-15585756 ] Yonik Seeley commented on SOLR-9506: Not sure I understand... are you suggesting a workaround in PeerSync (recoverWithReplicationOnly) to work around the correctness problem caused by this commit? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585736#comment-15585736 ] Pushkar Raste commented on SOLR-9506: - There is lot of confusion going on here. Would above test fail not fail, if we won't cache per segment indexfingerprint ? If yes, them we should revert the commit, if not we should open a new issue to fix the indexfingerprint computation altogether. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585709#comment-15585709 ] Yonik Seeley commented on SOLR-9506: Pretty simple to try out: {code} bin/solr start -e techproducts http://localhost:8983/solr/techproducts/query?q=*:* "response":{"numFound":32,"start":0,"docs":[ http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint":{ "maxVersionSpecified":9223372036854775807, "maxVersionEncountered":1548538118066405376, "maxInHash":1548538118066405376, "versionsHash":8803836617561505377, "numVersions":32, "numDocs":32, "maxDoc":32}} curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" -d 'apple' # this shows that the delete is visibie http://localhost:8983/solr/techproducts/query?q=*:* "response":{"numFound":31,"start":0,"docs":[ #fingerprint returns the same thing http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint":{ "maxVersionSpecified":9223372036854775807, "maxVersionEncountered":1548538118066405376, "maxInHash":1548538118066405376, "versionsHash":8803836617561505377, "numVersions":32, "numDocs":32, "maxDoc":32}} bin/solr stop -all bin/solr start -e techproducts #after a restart, fingerprint returns something different http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint":{ "maxVersionSpecified":9223372036854775807, "maxVersionEncountered":1548538118066405376, "maxInHash":1548538118066405376, "versionsHash":-131508374066080, "numVersions":31, "numDocs":31, "maxDoc":32}} {code} > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585707#comment-15585707 ] Pushkar Raste commented on SOLR-9506: - I think what Yonik is implying is that, if for some reason, replica does not apply delete properly, index fingerprint would still checkout and that would be a problem. Considering the issues with {{PeerSync}}, should add that option {{recoverWithReplicationOnly}} ? For most of the setups I doubt if people would have hundreds of thousands of records in updateLog in which which almost no one is using {{PeerSync}} anyway > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585700#comment-15585700 ] Yonik Seeley commented on SOLR-9506: Yep. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585694#comment-15585694 ] Keith Laban commented on SOLR-9506: --- Are you implying that if you add a document. commit it, compute the index fingerprint and cache the segments. Then delete that document and commit that change, and compute the fingerprint again with the cached segment fingerprint, you will end up with the same index fingerprint? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585693#comment-15585693 ] Keith Laban commented on SOLR-9506: --- Are you implying that if you add a document. commit it, compute the index fingerprint and cache the segments. Then delete that document and commit that change, and compute the fingerprint again with the cached segment fingerprint, you will end up with the same index fingerprint? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585688#comment-15585688 ] Pushkar Raste commented on SOLR-9506: - i.e. we really need fix IndexFingerprint computation, whether or not we cache. I will open a separate issue to fix it in that case. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585683#comment-15585683 ] Yonik Seeley commented on SOLR-9506: "Right... the core cache key does not change, even if there are deletes for the segment." So the cache key ignores deleted documents, while the value being cached does not. It's a fundamental mis-match. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585677#comment-15585677 ] Pushkar Raste commented on SOLR-9506: - I don't see why caching indexfingerprint per segment and using that later would be different than computing indexfingerprint on entire segment by going through one segment at time. I tried to come up with scenarios where caching solution would fail and original solution would not, but could not think of any. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585635#comment-15585635 ] Yonik Seeley commented on SOLR-9506: Hmmm, why was this committed? See my comments regarding deleted documents that were never addressed. What was committed will now result in incorrect fingerprints being returned. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585612#comment-15585612 ] Pushkar Raste commented on SOLR-9506: - I did not upload the patch with parallelStream. In SolrIndexSearcher where we compute and cache per segment indexfingerprint try switching from {{stream()}} to {{parallelStream()}} and you will see {{PeerSyncTest}} fails. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585570#comment-15585570 ] Noble Paul commented on SOLR-9506: -- which test. I did not find? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585553#comment-15585553 ] ASF subversion and git services commented on SOLR-9506: --- Commit bb907a2983b4a7eba8cb4d527a859f1b312bdc79 in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb907a2 ] * SOLR-9506: cache IndexFingerprint for each segment > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556139#comment-15556139 ] Pushkar Raste commented on SOLR-9506: - I computed hash w/o regard to deleted docs and cached it. All the tests are passing even without doing steps #2 and #3. I also verified that index fingerprint computed on entire index matches to that of fingerprint computed on from individual segments (even after deletions). > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556132#comment-15556132 ] Pushkar Raste commented on SOLR-9506: - I also found some weird behavior. If I use {{parallelStream}} to compute segment fingerprints in parallel. When I reduce it to the index fingerprint on the index searcher, test fails. Why should order of computation and reduction matter in this case? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552125#comment-15552125 ] Pushkar Raste commented on SOLR-9506: - Updated patch, added a scenario in {{PeerSyncTest}} about replica missing an update. Looks like with don't need to remove live docs check {{if (liveDocs != null && !liveDocs.get(doc)) continue;}} > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548767#comment-15548767 ] Yonik Seeley commented on SOLR-9506: A few random points after browsing this issue... bq. We can not use current versionsHash (unless we cache all the individual version numbers), as it is not additive. The current versionsHash is additive (it must be, because as you say segments may not line up between leader and replica, and document order may differ). When caching per segment, keep this property by simply adding the segment fingerprints together. Am I missing something here? bq. private final Mapcache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548510#comment-15548510 ] Noble Paul commented on SOLR-9506: -- https://github.com/apache/lucene-solr/pull/84 > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532555#comment-15532555 ] Pushkar Raste commented on SOLR-9506: - Discussed with [~noble.paul] We should cache fingerprint for a segment only if *maxVersion specified* > *max version in the segment* > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531954#comment-15531954 ] Noble Paul commented on SOLR-9506: -- Few quick points {code:java} // Map is not concurrent, but since computeIfAbsent is idempotent, it should be alright for two threads to compute values for the same key. private final Map> perSegmentFingerprintCache = new WeakHashMap<>(); {code} The map has to be thread safe, javadocs say that threadsafety depends on the map implementation We really don't need to keep a cache per version. The reason is, we only give one version number and only the latest segment will have to have to compute anything other than the full fingerprint. As soon as a new segment is added everything else other than the full fingerprint becomes useless. So, the solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute everytime. So, the cache should be {code:java} private final Map perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>()); {code} > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523631#comment-15523631 ] Noble Paul commented on SOLR-9506: -- the cumulative numDocs will be same anyway I guess it can be reproduced as follows # take a 2 replica shard # index and commit multiple times # delete one doc and commit # bring down replica # optimize leader # bring up replica I guess this will lead to a full replication > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523564#comment-15523564 ] Pushkar Raste commented on SOLR-9506: - I think what [~ichattopadhyaya] is hinting at, is that if {{numDocs}} account only for live (active) docs, then once documents are deleted in a segment, {{numDocs}} in the cached fingerprint might be wrong. Surprising, following test cases passed with my POC 1. {{PeerSyncTest}} 2. {{PeerSyncReplicationTest}} 3. {{SyncSliceTest}} In the worst case, we can atleast parallalize fingerprint computation. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523566#comment-15523566 ] Pushkar Raste commented on SOLR-9506: - Adding [~ysee...@gmail.com] in the loop > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523490#comment-15523490 ] Noble Paul commented on SOLR-9506: -- no. segments don't change > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523485#comment-15523485 ] Ishan Chattopadhyaya commented on SOLR-9506: We should keep in mind that previously written segments can change if there are deletes. Maybe we should recompute the per-segment fingerprints upon deletion in that segment. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523273#comment-15523273 ] Pushkar Raste commented on SOLR-9506: - In short you are suggesting that when we cache fingerprint for individual segments, we keep a list of version numbers in those segments around? That would be billions of {{Long}} values cached, which might be counter-productive, > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523257#comment-15523257 ] Noble Paul commented on SOLR-9506: -- [~praste] I've attached a sample program which computes versionsHash for leader and replica using the above example > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130 ] Pushkar Raste commented on SOLR-9506: - POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute `versionsInHash` from `versionsInHash` of individual segments. We can not use current `versionsHash` (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and hash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative `versionHash` of leader and replica would match * I still need to figure out how to keep cache in `DefaultSolrCoreState`, so that we can reuse `IndexFingerprint` of individual segments when a new Searcher is opened. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org