[GitHub] [lucene-solr] jpountz commented on a change in pull request #857: LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries

2019-09-09 Thread GitBox
jpountz commented on a change in pull request #857: LUCENE-8968: Improve 
performance of WITHIN and DISJOINT queries for Shape queries
URL: https://github.com/apache/lucene-solr/pull/857#discussion_r322441239
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeQuery.java
 ##
 @@ -373,49 +235,265 @@ protected Scorer getIntersectsScorer(ShapeQuery query, 
LeafReader reader, Weight
 // by computing the set of documents that do NOT match the query
 final FixedBitSet result = new FixedBitSet(reader.maxDoc());
 result.set(0, reader.maxDoc());
-int[] cost = new int[]{reader.maxDoc()};
-values.intersect(getInverseIntersectVisitor(query, result, cost));
+final long[] cost = new long[]{reader.maxDoc()};
+values.intersect(getInverseDenseVisitor(query, result, cost));
 final DocIdSetIterator iterator = new BitSetIterator(result, cost[0]);
 return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
   }
-
-  values.intersect(visitor);
-  DocIdSetIterator iterator = docIdSetBuilder.build().iterator();
+  final DocIdSetBuilder docIdSetBuilder = new 
DocIdSetBuilder(reader.maxDoc(), values, query.getField());
+  values.intersect(getSparseVisitor(query, docIdSetBuilder));
+  final DocIdSetIterator iterator = docIdSetBuilder.build().iterator();
   return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
 }
 
-/** returns a Scorer for all other (non INTERSECT) queries */
-protected Scorer getScorer(ShapeQuery query, Weight weight,
-   FixedBitSet intersect, FixedBitSet disjoint, 
final float boost, ScoreMode scoreMode) throws IOException {
-  values.intersect(visitor);
-  if (disjointVisitor != null) {
-values.intersect(disjointVisitor);
-  }
-  DocIdSetIterator iterator;
-  if (query.queryRelation == ShapeField.QueryRelation.DISJOINT) {
-disjoint.andNot(intersect);
-iterator = new BitSetIterator(disjoint, cost());
-  } else if (query.queryRelation == ShapeField.QueryRelation.WITHIN) {
-intersect.andNot(disjoint);
-iterator = new BitSetIterator(intersect, cost());
+/** Scorer used for WITHIN and DISJOINT **/
+private Scorer getDenseScorer(LeafReader reader, Weight weight, final 
float boost, ScoreMode scoreMode) throws IOException {
+  final FixedBitSet result = new FixedBitSet(reader.maxDoc());
+  final long[] cost;
+  if (values.getDocCount() == reader.maxDoc()) {
+// First we check if we have any hits so we are fast in the 
adversarial case where
+// the shape does not match any documents
+if (hasAnyHits(query, values) == false) {
+  // no hits so we can return
+  return new ConstantScoreScorer(weight, boost, scoreMode, 
DocIdSetIterator.empty());
+}
 
 Review comment:
   it'd be slightly better to handle this case in the scorer supplier to return 
a null scorer


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #857: LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries

2019-09-05 Thread GitBox
jpountz commented on a change in pull request #857: LUCENE-8968: Improve 
performance of WITHIN and DISJOINT queries for Shape queries
URL: https://github.com/apache/lucene-solr/pull/857#discussion_r321278855
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeQuery.java
 ##
 @@ -378,44 +249,328 @@ protected Scorer getIntersectsScorer(ShapeQuery query, 
LeafReader reader, Weight
 final DocIdSetIterator iterator = new BitSetIterator(result, cost[0]);
 return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
   }
-
+  DocIdSetBuilder docIdSetBuilder = new DocIdSetBuilder(reader.maxDoc(), 
values, query.getField());
+  IntersectVisitor visitor = getIntersectVisitor(query, docIdSetBuilder);
   values.intersect(visitor);
   DocIdSetIterator iterator = docIdSetBuilder.build().iterator();
   return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
 }
 
-/** returns a Scorer for all other (non INTERSECT) queries */
-protected Scorer getScorer(ShapeQuery query, Weight weight,
-   FixedBitSet intersect, FixedBitSet disjoint, 
final float boost, ScoreMode scoreMode) throws IOException {
-  values.intersect(visitor);
-  if (disjointVisitor != null) {
-values.intersect(disjointVisitor);
-  }
-  DocIdSetIterator iterator;
-  if (query.queryRelation == ShapeField.QueryRelation.DISJOINT) {
-disjoint.andNot(intersect);
-iterator = new BitSetIterator(disjoint, cost());
-  } else if (query.queryRelation == ShapeField.QueryRelation.WITHIN) {
-intersect.andNot(disjoint);
-iterator = new BitSetIterator(intersect, cost());
+private Scorer getDisjointScorer(LeafReader reader, Weight weight, final 
float boost, ScoreMode scoreMode) throws IOException {
+  if (values.getDocCount() == reader.maxDoc()) {
+// We need to visit all docs in the normal visitor so if
+// we have all documents in this segment then use the
+// inverse visitor
+final FixedBitSet result = new FixedBitSet(reader.maxDoc());
+result.set(0, reader.maxDoc());
+values.intersect(getInverseDisjointVisitor(query, result));
+final DocIdSetIterator iterator = new BitSetIterator(result, cost());
+return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
   } else {
-iterator = new BitSetIterator(intersect, cost());
+FixedBitSet intersects = new FixedBitSet(reader.maxDoc());
+FixedBitSet disjoint = new FixedBitSet(reader.maxDoc());
+IntersectVisitor visitor = getDisjointVisitor(query, intersects, 
disjoint);
+values.intersect(visitor);
+disjoint.andNot(intersects);
+DocIdSetIterator iterator = new BitSetIterator(disjoint, cost());
+return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
+  }
+}
+
+private Scorer getWithinScorer(LeafReader reader, Weight weight, final 
float boost, ScoreMode scoreMode) throws IOException {
+  if (values.getDocCount() == reader.maxDoc()) {
+// We need to visit all docs in the normal visitor so if
+// we have all documents in this segment then use the
+// inverse visitor
+final FixedBitSet result = new FixedBitSet(reader.maxDoc());
+result.set(0, reader.maxDoc());
+values.intersect(getInverseWithinVisitor(query, result));
+final DocIdSetIterator iterator = new BitSetIterator(result, cost());
+return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
+  } else {
+FixedBitSet within = new FixedBitSet(reader.maxDoc());
+FixedBitSet notWithin = new FixedBitSet(reader.maxDoc());
+IntersectVisitor visitor = getWithinVisitor(query, within, notWithin);
+values.intersect(visitor);
+within.andNot(notWithin);
+DocIdSetIterator iterator = new BitSetIterator(within, cost());
+return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
   }
-  return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
 }
 
 @Override
 public long cost() {
   if (cost == -1) {
 // Computing the cost may be expensive, so only do it if necessary
-if (queryRelation == ShapeField.QueryRelation.DISJOINT) {
-  cost = values.estimatePointCount(disjointVisitor);
-} else {
-  cost = values.estimatePointCount(visitor);
-}
+cost = values.estimatePointCount(getEstimateVisitor(query, 
query.getQueryRelation()));
 assert cost >= 0;
   }
   return cost;
 }
   }
+
+  /** create a visitor for calculating point count estimates for the provided 
relation */
+  private static IntersectVisitor getEstimateVisitor(final ShapeQuery query, 
final QueryRelation relation) {
+return new IntersectVisitor() {
+  @Override
+  public void visit(int docID) {
+

[GitHub] [lucene-solr] jpountz commented on a change in pull request #857: LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries

2019-09-05 Thread GitBox
jpountz commented on a change in pull request #857: LUCENE-8968: Improve 
performance of WITHIN and DISJOINT queries for Shape queries
URL: https://github.com/apache/lucene-solr/pull/857#discussion_r321276057
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeQuery.java
 ##
 @@ -235,17 +130,27 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
   return null;
 }
 
-boolean allDocsMatch = true;
-if (values.getDocCount() != reader.maxDoc() ||
-relateRangeToQuery(values.getMinPackedValue(), 
values.getMaxPackedValue(), queryRelation) != Relation.CELL_INSIDE_QUERY) {
-  allDocsMatch = false;
-}
-
 final Weight weight = this;
-if (allDocsMatch) {
+Relation rel = relateRangeToQuery(values.getMinPackedValue(), 
values.getMaxPackedValue(), queryRelation);
+if (rel == Relation.CELL_OUTSIDE_QUERY) {
+  // no documents match the query
   return new ScorerSupplier() {
 @Override
-public Scorer get(long leadCost) throws IOException {
+public Scorer get(long leadCost) {
+  return new ConstantScoreScorer(weight, score(), scoreMode, 
DocIdSetIterator.empty());
+}
+
+@Override
+public long cost() {
+  return 0;
+}
+  };
 
 Review comment:
   you can return a null ScorerSupplier in this case


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org