[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041216#comment-16041216 ] ASF subversion and git services commented on LUCENE-7828: - Commit 792a8799168a58477b3165c11cbf3ab241c1d9f8 in lucene-solr's branch refs/heads/branch_6x from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=792a879 ] LUCENE-7828: Speed up range queries on range fields by improving how we compute the relation between the query and inner nodes of the BKD tree. > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Nicholas Knize > Attachments: LUCENE-7828.patch > > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041217#comment-16041217 ] ASF subversion and git services commented on LUCENE-7828: - Commit 528899d845cc9ac73622cc0775667bd0c52cc694 in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=528899d ] LUCENE-7828: Speed up range queries on range fields by improving how we compute the relation between the query and inner nodes of the BKD tree. > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Nicholas Knize > Attachments: LUCENE-7828.patch > > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040473#comment-16040473 ] Adrien Grand commented on LUCENE-7828: -- I'll merge it soon if there are no objections. > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Nicholas Knize > Attachments: LUCENE-7828.patch > > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012634#comment-16012634 ] Adrien Grand commented on LUCENE-7828: -- Eh, not a bad speedup. :) > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Nicholas Knize > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012600#comment-16012600 ] Alan Woodward commented on LUCENE-7828: --- Yes, this looks like it only affects range fields, so the various shortcuts can be implemented in RangeFieldQuery's IntersectVisitor rather than changing the core API. This change gives me a 35% speedup on this particular dataset: {code} @@ -136,6 +136,11 @@ public void visit(int docID, byte[] leaf) throws IOException { } @Override public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { +if (Arrays.equals(minPackedValue, maxPackedValue)) { + if (queryType == QueryType.CONTAINS && comparator.isWithin(minPackedValue)) { +return Relation.CELL_INSIDE_QUERY; + } +} byte[] node = getInternalRange(minPackedValue, maxPackedValue); // compute range relation for BKD traversal if (comparator.isDisjoint(node)) { {code} > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Nicholas Knize > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012118#comment-16012118 ] Adrien Grand commented on LUCENE-7828: -- I think this is more a limitation of the way things are implemented today than a general limitation. Today we only look at the bounding box of all ranges, ie. the minimum min value and the maximum max value. However if we looked for instance at the maximum min value and the minimum max value, we could also shortcut CONTAINS queries? > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012095#comment-16012095 ] Alan Woodward commented on LUCENE-7828: --- The case here is for INTERSECTS or CONTAINS queries on LongRangeFields. If the values being stored are ranges, then you can only shortcut CONTAINED-BY queries by looking at the bounding box, for anything else you need to check all values. > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012012#comment-16012012 ] Adrien Grand commented on LUCENE-7828: -- Why does the node return {{CELL_CROSSES_QUERY}} if all values from the block match the range? It should return {{CELL_INSIDE_QUERY}} and then call {{visit(int docID)}} rather than {{void visit(int docID, byte[] packedValue)}}? > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
[ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010649#comment-16010649 ] Alan Woodward commented on LUCENE-7828: --- I'm trying out a few ideas here; the one I think shows the most promise is to change IntersectVisitor.visit(int, byte[]) to take an array of docids. This also opens up the possibility of speeding things up when a leaf only contains a few different values. > Improve PointValues visitor calls when all docs in a leaf share a value > --- > > Key: LUCENE-7828 > URL: https://issues.apache.org/jira/browse/LUCENE-7828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > When all the docs in a leaf node have the same value, range queries can waste > a lot of processing if the node itself returns CELL_CROSSES_QUERY when > compare() is called, in effect performing the same calculation in visit(int, > byte[]) over and over again. In the case I'm looking at (very low > cardinality indexed LongRange fields), this causes something of a perfect > storm for performance. PointValues can detect up front if a given node has a > single value (because it's min value and max value will be equal), so this > case should be fairly simple to identify and shortcut. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org