[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798 ]
Ignacio Vera edited comment on LUCENE-8867 at 6/18/19 4:16 PM: --------------------------------------------------------------- {quote} This is only an issue in the case that not all dimensions are indexed, right? Otherwise you could figure out that all values are equal in IntersectVisitor#compare? {quote} I think this is generic issue. The problem here is not when are values are equal but when you have a very low cardinality on the leaf nodes. In this case the can safe lots of space by storing the values in the proposed way. {quote} One concern I have with the patch is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance. {quote} I see your concern , another option would be to change more radically the interface and add a matches(byte[]) method that returns a boolean and then use the visit(docID) method. was (Author: ivera): {quote} This is only an issue in the case that not all dimensions are indexed, right? Otherwise you could figure out that all values are equal in IntersectVisitor#compare? {quote} I think this is generic issue. The problem here is not when are values are equal but when you have a very low cardinality on the leaf nodes. In this case the can safe lots of space by storing the values in the proposed way. {quote} One concern I have with the patch is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance. {quote} I see your concern , another option would be to change more radically the interface and add a matches(byte[]) method and then use the visit(docID) method. > Optimise BKD tree for low cardinality leaves > -------------------------------------------- > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Ignacio Vera > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org