[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728813#comment-15728813
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit fd1f608b49a7a8b5f7e6cc805378da2217ec657a in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fd1f608 ]

LUCENE-7563: remove redundant array copy in PackedIndexTree.clone


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, 
> LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728814#comment-15728814
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit 0c8e8e396a4ccc41e6af78ac7d0342716c36902a in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0c8e8e3 ]

LUCENE-7563: fix 6.x backport compilation errors


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, 
> LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728811#comment-15728811
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit f51766c00fc374a6fc6f407b723bd8458556de7d in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f51766c ]

LUCENE-7563: use a compressed format for the in-heap BKD index


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, 
> LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727037#comment-15727037
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit bd8b191505d92c89a483a6189497374238476a00 in lucene-solr's branch 
refs/heads/apiv2 from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bd8b191 ]

LUCENE-7563: remove redundant array copy in PackedIndexTree.clone


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, 
> LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727036#comment-15727036
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit 5e8db2e068f2549b9619d5ac48a50c8032fc292b in lucene-solr's branch 
refs/heads/apiv2 from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5e8db2e ]

LUCENE-7563: use a compressed format for the in-heap BKD index


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, 
> LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722537#comment-15722537
 ] 

Michael McCandless commented on LUCENE-7563:


Ahh, OK; I think we should restrict {{TestBKD}} to the same dimension count / 
bytes per dimension limits that Lucene enforces?  As we tighten up how we 
compress it on disk and the in-heap index we should only test for what we 
actually offer to the end user.

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-05 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722349#comment-15722349
 ] 

Adrien Grand commented on LUCENE-7563:
--

I digged into it, the test failure may happen with large numbers of bytes per 
dimension. It could be fixed if we limited the number of bytes per value of 
BKDWriter to 16 (like we do in FieldInfos) and made {{code}} a long.

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722090#comment-15722090
 ] 

Michael McCandless commented on LUCENE-7563:


bq. I think there is just a redundant arraycopy in clone()?

Thanks, I pushed a fix!

bq. For the record, I played with another idea leveraging the fact that the 
prefix lengths on two consecutive levels are likely close to each other,

I like this idea!  But I hit this test failure ... doesn't reproduce on trunk:

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBKD 
-Dtests.method=testWastedLeadingBytes -Dtests.seed=2E5F0E183BBA1098 
-Dtests.locale=es-PR -Dtests.timezone=CST -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.90s J1 | TestBKD.testWastedLeadingBytes <<<
   [junit4]> Throwable #1: java.lang.ArrayIndexOutOfBoundsException: -32
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([2E5F0E183BBA1098:ABD9D50B47794EFC]:0)
   [junit4]>at 
org.apache.lucene.util.bkd.BKDReader$PackedIndexTree.readNodeData(BKDReader.java:442)
   [junit4]>at 
org.apache.lucene.util.bkd.BKDReader$PackedIndexTree.(BKDReader.java:343)
   [junit4]>at 
org.apache.lucene.util.bkd.BKDReader.getIntersectState(BKDReader.java:526)
   [junit4]>at 
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:498)
   [junit4]>at 
org.apache.lucene.util.bkd.TestBKD.testWastedLeadingBytes(TestBKD.java:1042)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
{noformat}

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722054#comment-15722054
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit bd8b191505d92c89a483a6189497374238476a00 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bd8b191 ]

LUCENE-7563: remove redundant array copy in PackedIndexTree.clone


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-12-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719702#comment-15719702
 ] 

ASF subversion and git services commented on LUCENE-7563:
-

Commit 5e8db2e068f2549b9619d5ac48a50c8032fc292b in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5e8db2e ]

LUCENE-7563: use a compressed format for the in-heap BKD index


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702891#comment-15702891
 ] 

Adrien Grand commented on LUCENE-7563:
--

bq. Hmm I think I am already doing that?

You are right, I had not read the code correctly.

bq. Oooh that's a great idea! Saves 1 byte per inner node. We need 5 bits for 
the prefix I think since it can range 0 .. 16 inclusive, and 3 bits for the 
splitDim since it's 0 .. 7 inclusive.

I have been thinking about it more and I think we can make it more general. The 
first two bytes that differ are likely close to each other, so if we call their 
difference {{firstByteDelta}}, we could pack {{firstByteDelta}}, {{splitDim}} 
and {{prefix}} into a single vint (eg. {{(firstByteDelta * (1 + bytesPerDim) + 
prefix) * numDims + splitDim}}) that would sometimes only take one byte (quite 
often when {{numDims}} and {{bytesPerDim}} are small and rarely in the opposite 
case).

bq. but it felt wrong to just pass these packed bytes to the simple text format 
...

Agreed. Maybe we should duplicate the curent BKDReader/BKDWriter into a new 
impl that would be specific to SimpleText and would not need all those 
optimizations so that both impls can evolve separately.

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702280#comment-15702280
 ] 

Michael McCandless commented on LUCENE-7563:


bq. It seems we are always delta coding with the split value of the parent 
level, but for the multi-dimensional case, I think it would be better to 
delta-code with the last split value that was on the same dimension?

Hmm I think I am already doing that?  Note that the
{{splitValuesStack}} in {{BKDReader.PackedIndexTree}} holds all
dimensions' last split values, and then when I read the suffix bytes
in, I copy them into the packed values for the current split
dimension:

{noformat}
in.readBytes(splitValuesStack[level], splitDim*bytesPerDim+prefix, 
suffix);
{noformat}

I think?

I'll test on the OpenStreetMaps geo benchmark to measure the impact
... I'll also run the 2B tests to make sure nothing broke.

bq. For instance we use whole bytes to store the split dimension or the prefix 
length while they only need 3 and 4 bits? In the multi-dimensional case we 
could store both on a single byte.

Oooh that's a great idea!  Saves 1 byte per inner node.  We need 5
bits for the prefix I think since it can range 0 .. 16 inclusive, and
3 bits for the {{splitDim}} since it's 0 .. 7 inclusive.

bq. It doesn't need to be done in the same patch, but it would also be nice for 
SimpleText to not use the legacy format of the index. I'm not sure how to 
proceed however.

Yeah I'm not sure what to do here either ... but it felt wrong to just
pass these packed bytes to the simple text format ... that packed form
is even further from "simple" than the two arrays we have now.


> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701587#comment-15701587
 ] 

Adrien Grand commented on LUCENE-7563:
--

It seems we are always delta coding with the split value of the parent level, 
but for the multi-dimensional case, I think it would be better to delta-code 
with the last split value that was on the same dimension? Otherwise compression 
would be very poor if both dimensions store a very different range of values?

Something else I was wondering is whether we can make bigger gains. For 
instance we use whole bytes to store the split dimension or the prefix length 
while they only need 3 and 4 bits? In the multi-dimensional case we could store 
both on a single byte. Maybe we can do even better, I haven't though much about 
it.

It doesn't need to be done in the same patch, but it would also be nice for 
SimpleText to not use the legacy format of the index. I'm not sure how to 
proceed however.

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667700#comment-15667700
 ] 

Adrien Grand commented on LUCENE-7563:
--

+1

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org