[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887001#comment-16887001 ] ASF subversion and git services commented on LUCENE-7862: - Commit f026053d4d8269c7f7135d8a76ffa21235a05d4b in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f026053 ] LUCENE-8914: Move compare logic to IntersectVisitor in FloatPointNearestNeighbor (#783) Move the logic for discarding inner modes to the IntersectVisitor so we take advantage of the change introduced in LUCENE-7862 > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, 8.0 > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625692#comment-16625692 ] Ignacio Vera commented on LUCENE-7862: -- For reference: I was doing a bit research about how to further improve performance of the BKD tree and I come across this paper: http://infolab.stanford.edu/~nsample/pubs/samplehaines.pdf In point 6 they speak about a performance improvement by doing exactly what was implemented ihere. They put a name to the approach, BOUNDS-OVERLAPS-BALL (BOB) test. > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611645#comment-16611645 ] Ignacio Vera commented on LUCENE-7862: -- Thanks [~janhoy]! > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611644#comment-16611644 ] ASF subversion and git services commented on LUCENE-7862: - Commit 7c9b8b4b6167dce9ff6967d88a3a596e041671d6 in lucene-solr's branch refs/heads/branch_7_5 from iverase [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c9b8b4 ] LUCENE-7862:Change entry in NOTES.txt to the right lucene version > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611642#comment-16611642 ] ASF subversion and git services commented on LUCENE-7862: - Commit 0789a77c2590f716fc3cedb247309068c3fc5d85 in lucene-solr's branch refs/heads/branch_7x from iverase [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0789a77 ] LUCENE-7862:Change entry in NOTES.txt to the right lucene version > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611261#comment-16611261 ] Jan Høydahl commented on LUCENE-7862: - [~ivera], it appears that your CHANGES entry for the 7x and 7_5 branches is listed under 7.5.4 instead of 7.5.0. > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609483#comment-16609483 ] Michael McCandless commented on LUCENE-7862: Wow, these are surprisingly massive speedups in some cases > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 7.5, master (8.0) > > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608056#comment-16608056 ] ASF subversion and git services commented on LUCENE-7862: - Commit f406ff91a8912f13a7652a2802084db1c0da5830 in lucene-solr's branch refs/heads/master from iverase [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f406ff9 ] LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the number of dimensions is bigger than 1 > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608057#comment-16608057 ] ASF subversion and git services commented on LUCENE-7862: - Commit 46a3f1e6f551fd3c4b506f44e8632a310656f828 in lucene-solr's branch refs/heads/branch_7x from iverase [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=46a3f1e ] LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the number of dimensions is bigger than 1 > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607281#comment-16607281 ] Lucene/Solr QA commented on LUCENE-7862: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 31s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 14s{color} | {color:green} sandbox in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 15m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-7862 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938777/LUCENE-7862.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.4.0-130-generic #156~14.04.1-Ubuntu SMP Thu Jun 14 13:51:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 6fbcda6 | | ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 | | Default Java | 1.8.0_172 | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/87/testReport/ | | modules | C: lucene/core lucene/sandbox U: lucene | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/87/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607141#comment-16607141 ] Nicholas Knize commented on LUCENE-7862: +1 Not only is it great to have some benchmarks for BKD higher dimensions but its a great boost in performance for very little cost to indexing. > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605853#comment-16605853 ] Adrien Grand commented on LUCENE-7862: -- The improvement in QPS look indeed very significant in some cases! For very little overhead. The patch looks good to me, maybe the {{System.arraycopy(minPackedValue, 0, maxPackedValue, 0, packedBytesLength)}} call would benefit from a comment explaining that we are copying common prefixes. bq. Maybe we should only ad this extra information to the index when number of dimensions > 1 +1 > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604279#comment-16604279 ] Ignacio Vera commented on LUCENE-7862: -- I have been playing with this approach to see if we can make the BKD tree more efficient. I have done some benchmarks that confirm a big boost in performance whenever there is correlation between the dimensions so it is particularly good for ranges. In addition this boost quite a bit the newly introduced approach for indexing shapes in LUCENE-8396. Maybe we should only ad this extra information to the index when number of dimensions > 1. Here are a few benchmarks: 1) Ranges: Using the data points from GeoBench, the ranges have been created as following: {code:java} double lat = Double.parseDouble(parts[1]); double lon = Double.parseDouble(parts[2]); double latEnd = lat + 0.1 * Math.abs(lon); double lonEnd = lon + 0.1 * Math.abs(lat);{code} Where latitude is used in even dimension and longitude in uneven ones. IT = Index time (sec) FM = Force merge time (sec) IS = Index size (GB) RH = Reader heap (MB || Approach|| Dimensions||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM Diff||IS Dev||IS Base||IS Diff||RH Dev||RH Base||IRH Diff|| |ranges|1|102.7s|101.5s|1%|0.0s|0.0s|0%|0.81|0.81|0%|0.80|0.80|-0%| |ranges|2|96.0s|94.4s|2%|0.0s|0.0s|0%|1.59|1.58|0%|1.02|1.03|-1%| |ranges|3|93.1s|84.6s|10%|0.0s|0.0s|0%|2.29|2.29|0%|1.18|1.01|17%| |ranges|4|101.0s|91.0s|11%|0.0s|0.0s|0%|3.07|3.05|0%|1.04|1.12|-7%| HS = Hits per second QPS= Queries per second Hits = Total hits ||Approach|| Dimensions||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS Diff||Hits Dev||Hits Base||Hits Diff|| |ranges|1|131.17|116.20|13%|10.81|9.58|13%|2730282630|2730282630|0%| |ranges|2|34.07|10.83|215%|16.93|5.38|215%|452860046|452860046|0%| |ranges|3|34.41|3.83|799%|17.10|1.90|799%|452860046|452860046|0%| |ranges|4|26.01|3.27|696%|12.92|1.62|696%|452860046|452860046|0%|. | 3) GeoBench: comparison of Geo benchmark using points (LatLonPoint), geo3d (Geo3DPoint) and shapes (LaLonShape) IT = Index time (sec) FM = Force merge time (sec) IS = Index size (GB) RH = Reader heap (MB || Approach||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM Diff||IS Dev||IS Base||IS Diff||RH Dev||RH Base||IRH Diff|| |points|103.3s|100.0s|3%|0.0s|0.0s|0%|0.51|0.51|0%|0.61|0.61|0%| |geo3d|105.6s|102.9s|3%|0.0s|0.0s|0%|0.72|0.72|0%|0.62|0.62|-0%| |shapes|108.3s|100.6s|8%|0.0s|0.0s|0%|1.31|1.30|0%|0.87|0.87|-0%| HS = Hits per second QPS= Queries per second Hits = Total hits ||Approach|| Shape||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS Diff||Hits Dev||Hits Base||Hits Diff|| |points|polyRussia|17.19|17.19|0%|4.90|4.90|0%|3508846|3508846|0%| |points|polyMedium|9.51|9.26|3%|116.51|113.40|3%|2693559|2693559|0%| |points|distance|77.56|77.33|0%|45.57|45.43|0%|382961957|382961957|0%| |points|nearest 10|0.05|0.05|-0%|4651.95|4664.57|-0%|60844404|60844404|0%| |points|box|81.95|83.07|-1%|83.39|84.53|-1%|221118844|221118844|0%| |points|poly 10|80.37|79.78|1%|50.83|50.45|1%|355809475|355809475|0%| |points|sort|33.25|31.26|6%|33.83|31.81|6%|221118844|221118844|0%| |geo3d|polyRussia|0.51|0.50|2%|0.15|0.14|2%|3508671|3508671|0%| |geo3d|polyMedium|0.42|0.43|-0%|5.20|5.23|-0%|2693545|2693545|0%| |geo3d|distance|64.36|63.47|1%|37.77|37.25|1%|383371884|383371884|0%| |geo3d|box|49.29|51.60|-4%|50.15|52.50|-4%|221118844|221118844|0%| |geo3d|poly 10|39.40|39.43|-0%|24.91|24.93|-0%|355855227|355855227|0%| |shapes|polyRussia|2.81|0.31|819%|0.80|0.09|819%|3508846|3508846|0%| |shapes|polyMedium|0.52|0.08|539%|6.38|1.00|539%|2693559|2693559|0%| |shapes|box|11.07|1.46|661%|11.27|1.48|661%|221118844|221118844|0%| |shapes|poly 10|16.83|1.46|1052%|10.64|0.92|1052%|355809475|355809475|0%| > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch, LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check
[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
[ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033186#comment-16033186 ] Michael McCandless commented on LUCENE-7862: +1, this is a nice idea. You effectively "shrink wrap" each cell to its true min/max instead of the "approximate" value passed down recursively by walking the index. It looks like we pay an O(N) price when writing the leaf block with this change, to compute the actual min/max for each dimension in this leaf block, but we could maybe save that cost by having the caller compute these since it's already scanning items to partition the cells as it recurses? But we can explore that separately ... just an optimization. > Should BKD cells store their min/max packed values? > --- > > Key: LUCENE-7862 > URL: https://issues.apache.org/jira/browse/LUCENE-7862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7862.patch > > > The index of the BKD tree already allows to know lower and upper bounds of > values in a given dimension. However the actual range of values might be more > narrow than what the index tells us, especially if splitting on one dimension > reduces the range of values in at least one other dimension. For instance > this tends to be the case with range fields: since we enforce that lower > bounds are less than upper bounds, splitting on one dimension will also > affect the range of values in the other dimension. > So I'm wondering whether we should store the actual range of values for each > dimension in leaf blocks, this will hopefully allow to figure out that either > none or all values match in a block without having to check them all. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org