[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2019-07-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887001#comment-16887001
 ] 

ASF subversion and git services commented on LUCENE-7862:
-

Commit f026053d4d8269c7f7135d8a76ffa21235a05d4b in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f026053 ]

LUCENE-8914: Move compare logic to IntersectVisitor in 
FloatPointNearestNeighbor (#783)

Move the logic for discarding inner modes to the IntersectVisitor so we take 
advantage of the change introduced in LUCENE-7862

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, 8.0
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-24 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625692#comment-16625692
 ] 

Ignacio Vera commented on LUCENE-7862:
--

For reference:

I was doing a bit research about how to further improve performance of the BKD 
tree and I come across this paper:

http://infolab.stanford.edu/~nsample/pubs/samplehaines.pdf

In point 6 they speak about a performance improvement by doing exactly what was 
implemented ihere. They put a name to the approach, BOUNDS-OVERLAPS-BALL (BOB) 
test. 

 

 
 

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-12 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611645#comment-16611645
 ] 

Ignacio Vera commented on LUCENE-7862:
--

Thanks [~janhoy]!

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611644#comment-16611644
 ] 

ASF subversion and git services commented on LUCENE-7862:
-

Commit 7c9b8b4b6167dce9ff6967d88a3a596e041671d6 in lucene-solr's branch 
refs/heads/branch_7_5 from iverase
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c9b8b4 ]

LUCENE-7862:Change entry in NOTES.txt to the right lucene version


> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611642#comment-16611642
 ] 

ASF subversion and git services commented on LUCENE-7862:
-

Commit 0789a77c2590f716fc3cedb247309068c3fc5d85 in lucene-solr's branch 
refs/heads/branch_7x from iverase
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0789a77 ]

LUCENE-7862:Change entry in NOTES.txt to the right lucene version


> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611261#comment-16611261
 ] 

Jan Høydahl commented on LUCENE-7862:
-

[~ivera], it appears that your CHANGES entry for the 7x and 7_5 branches is 
listed under 7.5.4 instead of 7.5.0.

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609483#comment-16609483
 ] 

Michael McCandless commented on LUCENE-7862:


Wow, these are surprisingly massive speedups in some cases

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 7.5, master (8.0)
>
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608056#comment-16608056
 ] 

ASF subversion and git services commented on LUCENE-7862:
-

Commit f406ff91a8912f13a7652a2802084db1c0da5830 in lucene-solr's branch 
refs/heads/master from iverase
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f406ff9 ]

LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the 
number of dimensions is bigger than 1


> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608057#comment-16608057
 ] 

ASF subversion and git services commented on LUCENE-7862:
-

Commit 46a3f1e6f551fd3c4b506f44e8632a310656f828 in lucene-solr's branch 
refs/heads/branch_7x from iverase
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=46a3f1e ]

LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the 
number of dimensions is bigger than 1


> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-07 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607281#comment-16607281
 ] 

Lucene/Solr QA commented on LUCENE-7862:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
31s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
14s{color} | {color:green} sandbox in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-7862 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938777/LUCENE-7862.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-130-generic #156~14.04.1-Ubuntu SMP Thu 
Jun 14 13:51:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 6fbcda6 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_172 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/87/testReport/ |
| modules | C: lucene/core lucene/sandbox U: lucene |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/87/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-07 Thread Nicholas Knize (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607141#comment-16607141
 ] 

Nicholas Knize commented on LUCENE-7862:


+1  Not only is it great to have some benchmarks for BKD higher dimensions but 
its a great boost in performance for very little cost to indexing. 

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-06 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605853#comment-16605853
 ] 

Adrien Grand commented on LUCENE-7862:
--

The improvement in QPS look indeed very significant in some cases! For very 
little overhead. The patch looks good to me, maybe the 
{{System.arraycopy(minPackedValue, 0, maxPackedValue, 0, packedBytesLength)}} 
call would benefit from a comment explaining that we are copying common 
prefixes.

bq. Maybe we should only ad this extra information to the index when number of 
dimensions > 1

+1

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2018-09-05 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604279#comment-16604279
 ] 

Ignacio Vera commented on LUCENE-7862:
--

I have been playing with this approach to see if we can make the BKD tree more 
efficient. I have done some benchmarks that confirm a big boost in performance 
whenever there is correlation between the dimensions so it is particularly good 
for ranges. In addition this boost quite a bit the newly introduced approach 
for indexing shapes in LUCENE-8396.

Maybe we should only ad this extra information to the index when number of 
dimensions > 1. Here are a few benchmarks:

1) Ranges: Using the data points from GeoBench, the ranges have been created as 
following:
{code:java}
double lat = Double.parseDouble(parts[1]);
double lon = Double.parseDouble(parts[2]);
double latEnd = lat + 0.1 * Math.abs(lon);
double lonEnd = lon + 0.1 * Math.abs(lat);{code}
Where latitude is used in even dimension and longitude in uneven ones.

IT = Index time (sec)
 FM = Force merge time (sec)
 IS = Index size (GB)
 RH = Reader heap (MB
|| Approach|| Dimensions||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM 
Diff||IS Dev||IS Base||IS Diff||RH Dev||RH Base||IRH Diff||
|ranges|1|102.7s|101.5s|1%|0.0s|0.0s|0%|0.81|0.81|0%|0.80|0.80|-0%|
|ranges|2|96.0s|94.4s|2%|0.0s|0.0s|0%|1.59|1.58|0%|1.02|1.03|-1%|
|ranges|3|93.1s|84.6s|10%|0.0s|0.0s|0%|2.29|2.29|0%|1.18|1.01|17%|
|ranges|4|101.0s|91.0s|11%|0.0s|0.0s|0%|3.07|3.05|0%|1.04|1.12|-7%|

HS = Hits per second
 QPS= Queries per second
 Hits = Total hits
||Approach|| Dimensions||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS 
Diff||Hits Dev||Hits Base||Hits Diff||
|ranges|1|131.17|116.20|13%|10.81|9.58|13%|2730282630|2730282630|0%|
|ranges|2|34.07|10.83|215%|16.93|5.38|215%|452860046|452860046|0%|
|ranges|3|34.41|3.83|799%|17.10|1.90|799%|452860046|452860046|0%|
|ranges|4|26.01|3.27|696%|12.92|1.62|696%|452860046|452860046|0%|.  |

 

3) GeoBench: comparison of Geo benchmark using points (LatLonPoint), geo3d 
(Geo3DPoint) and shapes (LaLonShape)

IT = Index time (sec)
 FM = Force merge time (sec)
 IS = Index size (GB)
 RH = Reader heap (MB
|| Approach||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM Diff||IS Dev||IS 
Base||IS Diff||RH Dev||RH Base||IRH Diff||
|points|103.3s|100.0s|3%|0.0s|0.0s|0%|0.51|0.51|0%|0.61|0.61|0%|
|geo3d|105.6s|102.9s|3%|0.0s|0.0s|0%|0.72|0.72|0%|0.62|0.62|-0%|
|shapes|108.3s|100.6s|8%|0.0s|0.0s|0%|1.31|1.30|0%|0.87|0.87|-0%|

HS = Hits per second
 QPS= Queries per second
 Hits = Total hits
||Approach|| Shape||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS Diff||Hits 
Dev||Hits Base||Hits Diff||
|points|polyRussia|17.19|17.19|0%|4.90|4.90|0%|3508846|3508846|0%|
|points|polyMedium|9.51|9.26|3%|116.51|113.40|3%|2693559|2693559|0%|
|points|distance|77.56|77.33|0%|45.57|45.43|0%|382961957|382961957|0%|
|points|nearest 10|0.05|0.05|-0%|4651.95|4664.57|-0%|60844404|60844404|0%|
|points|box|81.95|83.07|-1%|83.39|84.53|-1%|221118844|221118844|0%|
|points|poly 10|80.37|79.78|1%|50.83|50.45|1%|355809475|355809475|0%|
|points|sort|33.25|31.26|6%|33.83|31.81|6%|221118844|221118844|0%|
|geo3d|polyRussia|0.51|0.50|2%|0.15|0.14|2%|3508671|3508671|0%|
|geo3d|polyMedium|0.42|0.43|-0%|5.20|5.23|-0%|2693545|2693545|0%|
|geo3d|distance|64.36|63.47|1%|37.77|37.25|1%|383371884|383371884|0%|
|geo3d|box|49.29|51.60|-4%|50.15|52.50|-4%|221118844|221118844|0%|
|geo3d|poly 10|39.40|39.43|-0%|24.91|24.93|-0%|355855227|355855227|0%|
|shapes|polyRussia|2.81|0.31|819%|0.80|0.09|819%|3508846|3508846|0%|
|shapes|polyMedium|0.52|0.08|539%|6.38|1.00|539%|2693559|2693559|0%|
|shapes|box|11.07|1.46|661%|11.27|1.48|661%|221118844|221118844|0%|
|shapes|poly 10|16.83|1.46|1052%|10.64|0.92|1052%|355809475|355809475|0%|

 

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check 

[jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?

2017-06-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033186#comment-16033186
 ] 

Michael McCandless commented on LUCENE-7862:


+1, this is a nice idea.  You effectively "shrink wrap" each cell to its true 
min/max instead of the "approximate" value passed down recursively by walking 
the index.

It looks like we pay an O(N) price when writing the leaf block with this 
change, to compute the actual min/max for each dimension in this leaf block, 
but we could maybe save that cost by having the caller compute these since it's 
already scanning items to partition the cells as it recurses?  But we can 
explore that separately ... just an optimization.

> Should BKD cells store their min/max packed values?
> ---
>
> Key: LUCENE-7862
> URL: https://issues.apache.org/jira/browse/LUCENE-7862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org