[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1667#comment-1667 ] ASF subversion and git services commented on LUCENE-8496: - Commit 804afbfd47cc8d86ceda6ea66f0afe304af1ad1b in lucene-solr's branch refs/heads/branch_7x from [~nknize] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=804afbf ] LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users to select a fewer number of dimensions to be used for creating the index than the total number of dimensions used for field encoding. i.e., dimensions 0 to N may be used to determine how to split the inner nodes, and dimensions N+1 to D are ignored and stored as data dimensions at the leaves. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645761#comment-16645761 ] Steve Rowe commented on LUCENE-8496: FYI two other failing tests on branch_7x from [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2891/] (before the commit was reverted): {noformat} ant test -Dtestcase=TestLucene60PointsFormat -Dtests.seed=B5A28E6677965A99 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=fr-CA -Dtests.timezone=Asia/Irkutsk -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {noformat} {noformat} ant test -Dtestcase=TestAssertingPointsFormat -Dtests.seed=F280908F18AE1657 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=dz -Dtests.timezone=Etc/GMT-10 -Dtests.asserts=true -Dtests.file.encoding=US-ASCII {noformat} > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645266#comment-16645266 ] Nicholas Knize commented on LUCENE-8496: I went ahead and reverted this feature from branch_7x until the backport can be cleaned up. Sorry for the noise. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645246#comment-16645246 ] Nicholas Knize commented on LUCENE-8496: Failure on branch_7x: {{ant test -Dtestcase=TestBKD -Dtests.seed=3A807E1398CE4499 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=sr-Latn-BA -Dtests.timezone=Africa/Malabo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII}} Muting test until fix is pushed. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641236#comment-16641236 ] Lucene/Solr QA commented on LUCENE-8496: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s{color} | {color:green} codecs in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 30m 31s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 18s{color} | {color:green} highlighter in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s{color} | {color:green} join in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 11s{color} | {color:green} memory in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 35s{color} | {color:green} sandbox in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 18s{color} | {color:green} spatial-extras in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 44s{color} | {color:green} test-framework in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 55s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.cloud.autoscaling.sim.TestSimPolicyCloud | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-8496 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942690/LUCENE-8496.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 367bdf7 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | 1.8.0_172 | | unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/103/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/103/testReport/ | | modules | C: lucene lucene/codecs lucene/core lucene/highlighter lucene/join lucene/memory lucene/sandbox lucene/spatial-extras lucene/test-framework solr/core U: . | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/103/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640579#comment-16640579 ] Lucene/Solr QA commented on LUCENE-8496: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} LUCENE-8496 does not apply to master. Rebase required? Wrong Branch? See https://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-8496 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942614/LUCENE-8496.patch | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/102/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637680#comment-16637680 ] Lucene/Solr QA commented on LUCENE-8496: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 11s{color} | {color:red} codecs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 31m 26s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s{color} | {color:green} highlighter in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s{color} | {color:green} join in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s{color} | {color:green} memory in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 56s{color} | {color:green} sandbox in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 45s{color} | {color:green} spatial-extras in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 53s{color} | {color:green} test-framework in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 35s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | lucene.codecs.simpletext.TestSimpleTextPointsFormat | | | solr.cloud.autoscaling.IndexSizeTriggerTest | | | solr.cloud.autoscaling.sim.TestSimTriggerIntegration | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-8496 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942299/LUCENE-8496.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 46f753d | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | 1.8.0_172 | | unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/99/artifact/out/patch-unit-lucene_codecs.txt | | unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/99/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/99/testReport/ | | modules | C: lucene lucene/codecs lucene/core lucene/highlighter lucene/join lucene/memory lucene/sandbox lucene/spatial-extras lucene/test-framework solr/core U: . | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/99/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, > LUCENE-8496.patch,
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615249#comment-16615249 ] Nicholas Knize commented on LUCENE-8496: {quote}It is a pity that the patch is so large{quote} Yeah. Refactoring {{pointDimensionCount}} touched a lot of files so the patch is rather busy. I could change it to leave {{pointDimensionCount}} as is and just add a new {{indexDimensionCount}}? {quote}Out of curiosity, did your working copy already have LUCENE-7862 when you ran the benchmark?{quote} Yes. My benchmark numbers include the latest change to store min/max packed values. The only difference is using {{LatLonShape}} without and with the selective indexing approach. {quote}...could you maybe set up a pull request or use Apache reviewboard{quote} Sure thing! I went ahead and opened a PR [here|https://github.com/apache/lucene-solr/pull/451] > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614910#comment-16614910 ] Adrien Grand commented on LUCENE-8496: -- It is a pity that the patch is so large given that the change is actually simple. I like the idea and the patch looks very clean overall, I see you added validation for corner-cases like rejecting dataDimensionCount>0 but indexDimensionCount==0. Out of curiosity, did your working copy already have LUCENE-7862 when you ran the benchmark? I have some minor comments on the patch, could you maybe set up a pull request or use Apache reviewboard to make it easier to comment on your changes and iterate? > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch > > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer
[ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612826#comment-16612826 ] Nicholas Knize commented on LUCENE-8496: Initial patch provided: The lionshare of the changes are made to {{FieldType}}, {{BKDWriter}}, and {{BKDReader}}. * {{FieldType}} - split {{pointDimensionCount}} into two new integers that define {{pointDataDimensionCount}} and {{pointIndexDimensionCount}}. {{pointIndexDimensionCount}} must be <= {{pointDataDimensionCount}} and defines the first {{n}} dimensions that will be used to build the index. The remaining {{pointDataDimensionCount}} - {{pointIndexDimensionCount}} dimensions are ignored while building (e.g., split/merge) the index. Getter and Setter utility methods are added. * {{BKDWriter}} - change {{writeIndex}} to encode and write {{numIndexDims}} in the 2 most significant bytes of the integer that formerly stored {{numDims}} this provides simple backwards compatability without requiring a change to {{FieldInfoFormat}}. Indexing methods are updated to only use the first {{numIndexDims}} while building the tree. Leaf nodes still use {{numDataDims}} for efficiently packing and compressing the leaf level data (data nodes). * {{BKDReader}} - add version checking in the constructor to decode {{numIndexDims}} and {{numDataDims}} from the packed dimension integer. Update index reading methods to only look at the first {{numIndexDims}} while traversing the tree. {{numDataDims}} are still used for decoding leaf level data. * API Changes - all instances of {{pointDimensionCount}} have been updated to {{pointDataDimensionCount}} and {{pointIndexDimensionCount}} to reflect total number of dimensions, and number of dimensions used for creating the index, respectively. * All POINT Tests and POINT based Fields have been updated to use the API changes. Benchmarking --- To benchmark the changes I update {{LatLonShape}} (not included in this patch) and ran benchmark tests both with and without selective indexing. The results are below: 6 dimension encoded {{LatLonShape}} w/o selective indexing -- INDEX SIZE: 1.2795778876170516 GB READER MB: 1.7928361892700195 BEST M hits/sec: 11.67378231920028 BEST QPS: 6.8635445274291715 for 225 queries, totHits=382688713 7 dimension LatLonShape encoding w/ 4 dimension selective indexing --- INDEX SIZE: 2.1509012933820486 GB READER MB: 1.8154268264770508 BEST M hits/sec: 17.018094815004627 BEST QPS: 10.005707519719927 for 225 queries, totHits=382688713 The gains are a little better than the differences between searching a 4d range vs a 6d range. The index size increased due to using 7 dimensions instead of 6, but I also switched over to a bit bigger encoding size. > Explore selective dimension indexing in BKDReader/Writer > > > Key: LUCENE-8496 > URL: https://issues.apache.org/jira/browse/LUCENE-8496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8496.patch > > > This issue explores adding a new feature to BKDReader/Writer that enables > users to select a fewer number of dimensions to be used for creating the BKD > index than the total number of dimensions specified for field encoding. This > is useful for encoding dimensional data that is used for interpreting the > encoded field data but unnecessary (or not efficient) for creating the index > structure. One such example is {{LatLonShape}} encoding. The first 4 > dimensions may be used to to efficiently search/index the triangle using its > precomputed bounding box as a 4D point, and the remaining dimensions can be > used to encode the vertices of the tessellated triangle. This causes BKD to > act much like an R-Tree for shape data where search is distilled into a 4D > point (instead of a more expensive 6D point) and the triangle is encoded > using a portion of the remaining (non-indexed) dimensions. Fields that use > the full data range for indexing are not impacted and behave as they normally > would. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org