[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798
 ] 

Ignacio Vera edited comment on LUCENE-8867 at 6/18/19 4:16 PM:
---------------------------------------------------------------

{quote}
This is only an issue in the case that not all dimensions are indexed, right? 
Otherwise you could figure out that all values are equal in 
IntersectVisitor#compare?
{quote}

I think this is generic issue. The problem here is not when are values are 
equal but when you have a very low cardinality on the leaf nodes. In this case 
the can safe lots of space by storing the values in the proposed way.


{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance.
{quote}

I see your concern , another option would be to change more radically the 
interface and add a matches(byte[]) method that returns a boolean and then use 
the visit(docID) method.





was (Author: ivera):
{quote}
This is only an issue in the case that not all dimensions are indexed, right? 
Otherwise you could figure out that all values are equal in 
IntersectVisitor#compare?
{quote}

I think this is generic issue. The problem here is not when are values are 
equal but when you have a very low cardinality on the leaf nodes. In this case 
the can safe lots of space by storing the values in the proposed way.


{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance.
{quote}

I see your concern , another option would be to change more radically the 
interface and add a matches(byte[]) method and then use the visit(docID) method.




> Optimise BKD tree for low cardinality leaves
> --------------------------------------------
>
>                 Key: LUCENE-8867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8867
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to