[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1667#comment-1667
 ] 

ASF subversion and git services commented on LUCENE-8496:
-

Commit 804afbfd47cc8d86ceda6ea66f0afe304af1ad1b in lucene-solr's branch 
refs/heads/branch_7x from [~nknize]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=804afbf ]

LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users to 
select a fewer number of dimensions to be used for creating the index than the 
total number of dimensions used for field encoding. i.e., dimensions 0 to N may 
be used to determine how to split the inner nodes, and dimensions N+1 to D are 
ignored and stored as data dimensions at the leaves.


> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645761#comment-16645761
 ] 

Steve Rowe commented on LUCENE-8496:


FYI two other failing tests on branch_7x from 
[https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2891/] (before the 
commit was reverted):

{noformat}
ant test -Dtestcase=TestLucene60PointsFormat -Dtests.seed=B5A28E6677965A99 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=fr-CA 
-Dtests.timezone=Asia/Irkutsk -Dtests.asserts=true -Dtests.file.encoding=UTF-8
{noformat}

{noformat}
ant test -Dtestcase=TestAssertingPointsFormat -Dtests.seed=F280908F18AE1657 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=dz 
-Dtests.timezone=Etc/GMT-10 -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
{noformat}

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-10 Thread Nicholas Knize (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645266#comment-16645266
 ] 

Nicholas Knize commented on LUCENE-8496:


I went ahead and reverted this feature from branch_7x until the backport can be 
cleaned up. Sorry for the noise.

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-10 Thread Nicholas Knize (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645246#comment-16645246
 ] 

Nicholas Knize commented on LUCENE-8496:


Failure on branch_7x: 
{{ant test  -Dtestcase=TestBKD -Dtests.seed=3A807E1398CE4499 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=sr-Latn-BA -Dtests.timezone=Africa/Malabo 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII}}

Muting test until fix is pushed.

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-07 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641236#comment-16641236
 ] 

Lucene/Solr QA commented on LUCENE-8496:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
5s{color} | {color:green} codecs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 30m 
31s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
18s{color} | {color:green} highlighter in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
58s{color} | {color:green} join in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
11s{color} | {color:green} memory in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
35s{color} | {color:green} sandbox in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
18s{color} | {color:green} spatial-extras in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
44s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 55s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.autoscaling.sim.TestSimPolicyCloud |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8496 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12942690/LUCENE-8496.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 367bdf7 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_172 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/103/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/103/testReport/ |
| modules | C: lucene lucene/codecs lucene/core lucene/highlighter lucene/join 
lucene/memory lucene/sandbox lucene/spatial-extras lucene/test-framework 
solr/core U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/103/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of 

[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-06 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640579#comment-16640579
 ] 

Lucene/Solr QA commented on LUCENE-8496:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} LUCENE-8496 does not apply to master. Rebase required? Wrong 
Branch? See 
https://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8496 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12942614/LUCENE-8496.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/102/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, LUCENE-8496.patch, LatLonShape_SelectiveEncoding.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-10-03 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637680#comment-16637680
 ] 

Lucene/Solr QA commented on LUCENE-8496:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 11s{color} 
| {color:red} codecs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 31m 
26s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
16s{color} | {color:green} highlighter in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
15s{color} | {color:green} join in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} memory in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
56s{color} | {color:green} sandbox in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
45s{color} | {color:green} spatial-extras in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
53s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 35s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | lucene.codecs.simpletext.TestSimpleTextPointsFormat |
|   | solr.cloud.autoscaling.IndexSizeTriggerTest |
|   | solr.cloud.autoscaling.sim.TestSimTriggerIntegration |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8496 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12942299/LUCENE-8496.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 46f753d |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_172 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/99/artifact/out/patch-unit-lucene_codecs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/99/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/99/testReport/ |
| modules | C: lucene lucene/codecs lucene/core lucene/highlighter lucene/join 
lucene/memory lucene/sandbox lucene/spatial-extras lucene/test-framework 
solr/core U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/99/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch, LUCENE-8496.patch, LUCENE-8496.patch, 
> LUCENE-8496.patch, 

[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-09-14 Thread Nicholas Knize (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615249#comment-16615249
 ] 

Nicholas Knize commented on LUCENE-8496:


{quote}It is a pity that the patch is so large{quote}

Yeah. Refactoring {{pointDimensionCount}} touched a lot of files so the patch 
is rather busy. I could change it to leave {{pointDimensionCount}} as is and 
just add a new {{indexDimensionCount}}?

{quote}Out of curiosity, did your working copy already have LUCENE-7862 when 
you ran the benchmark?{quote}

Yes. My benchmark numbers include the latest change to store min/max packed 
values. The only difference is using {{LatLonShape}} without and with the 
selective indexing approach.

{quote}...could you maybe set up a pull request or use Apache reviewboard{quote}

 Sure thing! I went ahead and opened a PR 
[here|https://github.com/apache/lucene-solr/pull/451]

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-09-14 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614910#comment-16614910
 ] 

Adrien Grand commented on LUCENE-8496:
--

It is a pity that the patch is so large given that the change is actually 
simple. I like the idea and the patch looks very clean overall, I see you added 
validation for corner-cases like rejecting dataDimensionCount>0 but 
indexDimensionCount==0. Out of curiosity, did your working copy already have 
LUCENE-7862 when you ran the benchmark? I have some minor comments on the 
patch, could you maybe set up a pull request or use Apache reviewboard to make 
it easier to comment on your changes and iterate?

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch
>
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

2018-09-12 Thread Nicholas Knize (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612826#comment-16612826
 ] 

Nicholas Knize commented on LUCENE-8496:


Initial patch provided:

The lionshare of the changes are made to {{FieldType}}, {{BKDWriter}}, and 
{{BKDReader}}.

* {{FieldType}} - split {{pointDimensionCount}} into two new integers that 
define {{pointDataDimensionCount}} and {{pointIndexDimensionCount}}. 
{{pointIndexDimensionCount}} must be <= {{pointDataDimensionCount}} and defines 
the first {{n}} dimensions that will be used to build the index. The remaining 
{{pointDataDimensionCount}} - {{pointIndexDimensionCount}} dimensions are 
ignored while building (e.g., split/merge) the index. Getter and Setter utility 
methods are added.

* {{BKDWriter}} - change {{writeIndex}} to encode and write {{numIndexDims}} in 
the 2 most significant bytes of the integer that formerly stored {{numDims}} 
this provides simple backwards compatability without requiring a change to 
{{FieldInfoFormat}}. Indexing methods are updated to only use the first 
{{numIndexDims}} while building the tree. Leaf nodes still use {{numDataDims}} 
for efficiently packing and compressing the leaf level data (data nodes).

* {{BKDReader}} - add version checking in the constructor to decode 
{{numIndexDims}} and {{numDataDims}} from the packed dimension integer. Update 
index reading methods to only look at the first {{numIndexDims}} while 
traversing the tree. {{numDataDims}} are still used for decoding leaf level 
data.

* API Changes - all instances of {{pointDimensionCount}} have been updated to 
{{pointDataDimensionCount}} and {{pointIndexDimensionCount}} to reflect total 
number of dimensions, and number of dimensions used for creating the index, 
respectively.

* All POINT Tests and POINT based Fields have been updated to use the API 
changes.

Benchmarking
---

To benchmark the changes I update {{LatLonShape}} (not included in this patch) 
and ran benchmark tests both with and without selective indexing. The results 
are below: 

6 dimension encoded {{LatLonShape}} w/o selective indexing
--
INDEX SIZE: 1.2795778876170516 GB
READER MB: 1.7928361892700195
BEST M hits/sec: 11.67378231920028
BEST QPS: 6.8635445274291715 for 225 queries, totHits=382688713

7 dimension LatLonShape encoding w/ 4 dimension selective indexing
---
INDEX SIZE: 2.1509012933820486 GB
READER MB: 1.8154268264770508
BEST M hits/sec: 17.018094815004627
BEST QPS: 10.005707519719927 for 225 queries, totHits=382688713

The gains are a little better than the differences between searching a 4d range 
vs a 6d range. The index size increased due to using 7 dimensions instead of 6, 
but I also switched over to a bit bigger encoding size.

> Explore selective dimension indexing in BKDReader/Writer
> 
>
> Key: LUCENE-8496
> URL: https://issues.apache.org/jira/browse/LUCENE-8496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8496.patch
>
>
> This issue explores adding a new feature to BKDReader/Writer that enables 
> users to select a fewer number of dimensions to be used for creating the BKD 
> index than the total number of dimensions specified for field encoding. This 
> is useful for encoding dimensional data that is used for interpreting the 
> encoded field data but unnecessary (or not efficient) for creating the index 
> structure. One such example is {{LatLonShape}} encoding. The first 4 
> dimensions may be used to to efficiently search/index the triangle using its 
> precomputed bounding box as a 4D point, and the remaining dimensions can be 
> used to encode the vertices of the tessellated triangle. This causes BKD to 
> act much like an R-Tree for shape data where search is distilled into a 4D 
> point (instead of a more expensive 6D point) and the triangle is encoded 
> using a portion of the remaining (non-indexed) dimensions. Fields that use 
> the full data range for indexing are not impacted and behave as they normally 
> would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org