[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r319066704 ## File path: lucene/core/src/java/org/apache/lucene/search/HitCountThresholdCheckerFactory.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search; + +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Factory which generates instances of GlobalHitsThresholdChecker and LocalHitsThresholdChecker + * with the threshold as the passed in limit + */ +public final class HitCountThresholdCheckerFactory { Review comment: I agree for the interface but the static functions to help build simple checker could be directly implemented in `HitsThresholdChecker` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r319063786 ## File path: lucene/core/src/java/org/apache/lucene/search/HitCountThresholdCheckerFactory.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search; + +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Factory which generates instances of GlobalHitsThresholdChecker and LocalHitsThresholdChecker + * with the threshold as the passed in limit + */ +public final class HitCountThresholdCheckerFactory { Review comment: Since we return a `HitsThresholdChecker` we could also make it public and add the private impl and the static functions creation there. We don't really need this factory class ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r319030989 ## File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java ## @@ -469,7 +469,7 @@ public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOEx @Override public TopScoreDocCollector newCollector() throws IOException { -return TopScoreDocCollector.create(cappedNumHits, after, TOTAL_HITS_THRESHOLD); +return TopScoreDocCollector.create(cappedNumHits, after, new GlobalHitsThresholdChecker(TOTAL_HITS_THRESHOLD)); Review comment: The `HitsThresholdChecker` should be created once and shared within the collectors ? We also don't need to use the `GlobalHitsThresholdChecker` if the executor is null or if there is a single slice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r319031039 ## File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java ## @@ -598,7 +598,7 @@ private TopFieldDocs searchAfter(FieldDoc after, Query query, int numHits, Sort @Override public TopFieldCollector newCollector() throws IOException { // TODO: don't pay the price for accurate hit counts by default -return TopFieldCollector.create(rewrittenSort, cappedNumHits, after, TOTAL_HITS_THRESHOLD); +return TopFieldCollector.create(rewrittenSort, cappedNumHits, after, new GlobalHitsThresholdChecker(TOTAL_HITS_THRESHOLD)); Review comment: same here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r319030989 ## File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java ## @@ -469,7 +469,7 @@ public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOEx @Override public TopScoreDocCollector newCollector() throws IOException { -return TopScoreDocCollector.create(cappedNumHits, after, TOTAL_HITS_THRESHOLD); +return TopScoreDocCollector.create(cappedNumHits, after, new GlobalHitsThresholdChecker(TOTAL_HITS_THRESHOLD)); Review comment: It should be created once and shared within the collectors ? We also don't need to use the `GlobalHitsThresholdChecker` if the executor is null or if there is a single slice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318559088 ## File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java ## @@ -282,6 +293,7 @@ public void collect(int doc) throws IOException { private static final ScoreDoc[] EMPTY_SCOREDOCS = new ScoreDoc[0]; final int numHits; + final HitsThresholdChecker hitsThresholdChecker; final int totalHitsThreshold; Review comment: This is unused so we can remove it ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318562876 ## File path: lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java ## @@ -207,16 +210,23 @@ public static TopScoreDocCollector create(int numHits, ScoreDoc after, int total } } - final int totalHitsThreshold; ScoreDoc pqTop; + final HitsThresholdChecker hitsThresholdChecker; // prevents instantiation - TopScoreDocCollector(int numHits, int totalHitsThreshold) { + TopScoreDocCollector(int numHits, int totalHitsThreshold, HitsThresholdChecker hitsThresholdChecker) { Review comment: `totalHitsThreshold` is not needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318563308 ## File path: lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java ## @@ -207,16 +210,23 @@ public static TopScoreDocCollector create(int numHits, ScoreDoc after, int total } } - final int totalHitsThreshold; ScoreDoc pqTop; + final HitsThresholdChecker hitsThresholdChecker; // prevents instantiation - TopScoreDocCollector(int numHits, int totalHitsThreshold) { + TopScoreDocCollector(int numHits, int totalHitsThreshold, HitsThresholdChecker hitsThresholdChecker) { super(new HitQueue(numHits, true)); -this.totalHitsThreshold = totalHitsThreshold; +assert hitsThresholdChecker != null; + // HitQueue implements getSentinelObject to return a ScoreDoc, so we know // that at this point top() is already initialized. pqTop = pq.top(); +this.hitsThresholdChecker = hitsThresholdChecker; + } + + // Same as above but uses a local hits checker for hits threshold checks + TopScoreDocCollector(int numHits, int totalHitsThreshold) { Review comment: Same as `TopFieldCollector`, we should have a single constructor that takes a `HitsThresholdChecker` and build the `totalHitsThreshold` logic in the static function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318562576 ## File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java ## @@ -414,6 +428,45 @@ public static TopFieldCollector create(Sort sort, int numHits, FieldDoc after, } } + /** + * Same as above with an additional parameter to allow passing in the threshold checker + */ + public static TopFieldCollector create(Sort sort, int numHits, FieldDoc after, + int totalHitsThreshold, HitsThresholdChecker hitsThresholdChecker) { Review comment: We don't need the `totalHitsThreshold` here, is it possible to move the argument validation (totalHitsThreshold < 0) in the HitsThresholdChecker ctr ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318559370 ## File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java ## @@ -297,10 +309,12 @@ public void collect(int doc) throws IOException { // internal versions. If someone will define a constructor with any other // visibility, then anyone will be able to extend the class, which is not what // we want. - private TopFieldCollector(FieldValueHitQueue pq, int numHits, int totalHitsThreshold, boolean needsScores) { + private TopFieldCollector(FieldValueHitQueue pq, int numHits, int totalHitsThreshold, Review comment: We don't need the `totalHitsThreshold` if we have the `hitsThresholdChecker` below This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318562103 ## File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java ## @@ -200,6 +204,12 @@ public PagingFieldCollector(Sort sort, FieldValueHitQueue queue, FieldDoc } } +// Default case -- single threaded execution +public PagingFieldCollector(Sort sort, FieldValueHitQueue queue, FieldDoc after, int numHits, +int totalHitsThreshold) { Review comment: We should have a single ctr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search
jimczi commented on a change in pull request #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#discussion_r318561896 ## File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java ## @@ -96,16 +96,21 @@ private static boolean canEarlyTerminateOnPrefix(Sort searchSort, Sort indexSort * document scores and maxScore. */ private static class SimpleFieldCollector extends TopFieldCollector { - final Sort sort; final FieldValueHitQueue queue; -public SimpleFieldCollector(Sort sort, FieldValueHitQueue queue, int numHits, int totalHitsThreshold) { - super(queue, numHits, totalHitsThreshold, sort.needsScores()); +public SimpleFieldCollector(Sort sort, FieldValueHitQueue queue, int numHits, int totalHitsThreshold, +HitsThresholdChecker hitsThresholdChecker) { + super(queue, numHits, totalHitsThreshold, hitsThresholdChecker, sort.needsScores()); this.sort = sort; this.queue = queue; } +/* Default case -- Single threaded execution */ +public SimpleFieldCollector(Sort sort, FieldValueHitQueue queue, int numHits, int totalHitsThreshold) { Review comment: Could we replace this ctr with a static function that calls `TopFieldCollector#create` with the appropriate `LocalHitsThresholdChecker`? This way we don't need to reimplement the logic to create `SimpleFieldCollector` or `PagingCollector` twice, e.g.: public static TopFieldCollector create(Sort sort, int numHits, FieldDoc after, int totalHitsThreshold) { if (totalHitsThreshold < 0) { throw new IllegalArgumentException("totalHitsThreshold must be >= 0, got " + totalHitsThreshold); } return create(sort, numHits, new LocalHitsThresholdChecker(totalHitsThreshold)); } ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org