[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-06-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041216#comment-16041216
 ] 

ASF subversion and git services commented on LUCENE-7828:
-

Commit 792a8799168a58477b3165c11cbf3ab241c1d9f8 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=792a879 ]

LUCENE-7828: Speed up range queries on range fields by improving how we compute 
the relation between the query and inner nodes of the BKD tree.


> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Nicholas Knize
> Attachments: LUCENE-7828.patch
>
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-06-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041217#comment-16041217
 ] 

ASF subversion and git services commented on LUCENE-7828:
-

Commit 528899d845cc9ac73622cc0775667bd0c52cc694 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=528899d ]

LUCENE-7828: Speed up range queries on range fields by improving how we compute 
the relation between the query and inner nodes of the BKD tree.


> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Nicholas Knize
> Attachments: LUCENE-7828.patch
>
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-06-07 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040473#comment-16040473
 ] 

Adrien Grand commented on LUCENE-7828:
--

I'll merge it soon if there are no objections.

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Nicholas Knize
> Attachments: LUCENE-7828.patch
>
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012634#comment-16012634
 ] 

Adrien Grand commented on LUCENE-7828:
--

Eh, not a bad speedup. :)

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Nicholas Knize
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012600#comment-16012600
 ] 

Alan Woodward commented on LUCENE-7828:
---

Yes, this looks like it only affects range fields, so the various shortcuts can 
be implemented in RangeFieldQuery's IntersectVisitor rather than changing the 
core API.  This change gives me a 35% speedup on this particular dataset:

{code}

 @@ -136,6 +136,11 @@ public void visit(int docID, byte[] leaf) throws 
IOException {
}
@Override
public Relation compare(byte[] minPackedValue, byte[] 
maxPackedValue) {
 +if (Arrays.equals(minPackedValue, maxPackedValue)) {
 +  if (queryType == QueryType.CONTAINS && 
comparator.isWithin(minPackedValue)) {
 +return Relation.CELL_INSIDE_QUERY;
 +  }
 +}
  byte[] node = getInternalRange(minPackedValue, 
maxPackedValue);
  // compute range relation for BKD traversal
  if (comparator.isDisjoint(node)) {
{code}

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Nicholas Knize
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012118#comment-16012118
 ] 

Adrien Grand commented on LUCENE-7828:
--

I think this is more a limitation of the way things are implemented today than 
a general limitation. Today we only look at the bounding box of all ranges, ie. 
the minimum min value and the maximum max value. However if we looked for 
instance at the maximum min value and the minimum max value, we could also 
shortcut CONTAINS queries?

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012095#comment-16012095
 ] 

Alan Woodward commented on LUCENE-7828:
---

The case here is for INTERSECTS or CONTAINS queries on LongRangeFields.  If the 
values being stored are ranges, then you can only shortcut CONTAINED-BY queries 
by looking at the bounding box, for anything else you need to check all values.

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012012#comment-16012012
 ] 

Adrien Grand commented on LUCENE-7828:
--

Why does the node return {{CELL_CROSSES_QUERY}} if all values from the block 
match the range? It should return {{CELL_INSIDE_QUERY}} and then call 
{{visit(int docID)}} rather than {{void visit(int docID, byte[] packedValue)}}?

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value

2017-05-15 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010649#comment-16010649
 ] 

Alan Woodward commented on LUCENE-7828:
---

I'm trying out a few ideas here; the one I think shows the most promise is to 
change IntersectVisitor.visit(int, byte[]) to take an array of docids.  This 
also opens up the possibility of speeding things up when a leaf only contains a 
few different values.

> Improve PointValues visitor calls when all docs in a leaf share a value
> ---
>
> Key: LUCENE-7828
> URL: https://issues.apache.org/jira/browse/LUCENE-7828
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org