[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-617422262 @mikemccand Thanks for looking at this > Do you know why you are seeing these warnings? WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541658 vs 541658+ WARNING: cat=TermDTSort: hit counts differ: 68644 vs 68644+ ... the optimization did not wind up skipping any hits (though, it thought it may have, hence the added +) Indeed, we set up `totalHitsRelation` to `GREATER_THAN_OR_EQUAL_TO` when we try to run the optimization, but the optimization may end up not skipping any documents if it is not selective enough. It looks like some Scorers behave the same way in updating competitive scores (e.g LongDistanceFeatureQuery#DistanceScorer). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615300443 I have caught up with @jimczi offline, and it could be how selective a query iterator is important for performance. It is possible that if a query iterator is already selective enough there is no point to materialize a collector's iterator based on points. I am going to run benchmarks on MatchAll query to investigate that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615280419 @romseygeek Are you suggesting to do ```java if (updateCounter > 1024 && (updateCounter & 0x1f) != 0x1f) { ``` but this will run optimization even more often which we want to avoid, no? It the wikimedium1m TermDTSort case, `updateCounter` doesn't even reach 20 (so the optimization doesn't called that many times), but enough to make it slower than the traditional sort. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615263686 I thought I also report benchmarking results if we apply the optimization only on segments over 1 million docs . Here we don't have any significant reductions, but also able to achieve speedups. wikimedium1m ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff TermDTSort 395.99 (8.0%) 360.72 (11.5%) -8.9% ( -26% - 11%) HighTermDayOfYearSort 49.51 (19.8%) 51.95 (14.0%) 4.9% ( -24% - 48%) WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541658 vs 541658+ WARNING: cat=TermDTSort: hit counts differ: 68644 vs 68644+ ``` wikimedium10m ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff TermDTSort 83.37 (5.1%) 111.73 (30.4%) 34.0% ( -1% - 73%) HighTermDayOfYearSort 52.46 (6.9%) 46.76 (12.4%) -10.9% ( -28% -9%) WARNING: cat=HighTermDayOfYearSort: hit counts differ: 496079 vs 496079+ WARNING: cat=TermDTSort: hit counts differ: 506054 vs 44560+ ``` wikimediumall ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff TermDTSort 32.23 (3.2%) 85.28 (26.2%) 164.6% ( 131% - 200%) HighTermDayOfYearSort 14.46 (5.0%) 13.93 (6.6%) -3.7% ( -14% -8%) WARNING: cat=HighTermDayOfYearSort: hit counts differ: 2485178 vs 2485178+ WARNING: cat=TermDTSort: hit counts differ: 1474717 vs 106400+ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615261001 Sorry for bringing this up and not finishing, but I thought that is also worth to report the test results on a smaller collection `wikimedium1m`: ``` TaskQPS baseline StdDevQPS patch StdDev Pct diff TermDTSort 292.71 (15.1%) 59.60 (4.9%) -79.6% ( -86% - -70%) HighTermDayOfYearSort 60.01 (44.0%) 33.75 (13.6%) -43.8% ( -70% - 24%) WARNING: cat=HighTermDayOfYearSort: hit counts differ: 65216 vs 65093+ WARNING: cat=TermDTSort: hit counts differ: 68644 vs 507+ ``` Here there is a substantial reduction in performance by using the proposed sort optimization. As the data in these indexes are not monotonically increasing `setBottom` is called many times. Looks like for smaller indexes (especially with data that is not monotonically increasing) it is faster just to do the conventional sort than the proposed optimization. I am not sure how significant is this reduction. - **Should we apply the optimization only for segments over 1 million docs?** - **Should we apply the optimization only when the data is diverse enough?** Or we can follow up on these proposals in subsequent PRs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-614988778 @msokolov @jimczi @jpountz I was wondering if you have any other additional comments for this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-614931785 I have run another round of benchmarks, this time comparing the performance of this PR VS master as we don't need any special sort field. [Here](https://github.com/mayya-sharipova/luceneutil/commit/c3166e4fc44e7fcddcd1672112c96364d9f464e5) are the changes made to luceneutil. **wikimedium10m** ``` TaskQPS baseline StdDevQPS patch StdDev Pct diff HighTermDayOfYearSort 50.93 (5.6%) 49.31 (10.9%) -3.2% ( -18% - 14%) TermDTSort 83.37 (5.9%) 129.95 (41.2%) 55.9% ( 8% - 109%) WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541957 vs 541957+ WARNING: cat=TermDTSort: hit counts differ: 506054 vs 1861+ ``` Here we have two sorts: - Int sort on a day of year. Slight decrease of performance: -3.2%. There was an attempt to do the optimization, but the optimization was eventually not run as every time [estimatedNumberOfMatches](https://github.com/apache/lucene-solr/pull/1351/files#diff-aff67e212aa0edd675ec31c068cb642bR268) was not selective enough. The reason for that the data here a day of the year in the range [1, 366], and all segments contain various values through a segment. - Long sort on date field (msecSinceEpoch). Speedups: 55.9%. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-611260306 @romseygeek I have tried to address your outstanding feedback in 4448499f0f. Can you please continue the review when you have time? > Move the logic that checks whether or not to update the iterator into setBottom on the leaf comparator. In the new `FilteringFieldComparator` class, the iterator is updated in - setBottom - when we change a segment in `getLeafComparator`, so that we can also update iterators of subsequent segments. - and also when for the first time queue becomes full and hitsThreshold is reached in `setCanUpdateIterator`, this method is called from a collector. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-610078508 @romseygeek Thanks for the feedback. I have addressed your comments 1 and 2 in 89d241e. Indeed, the APIs look simpler, I like them more now. I just renamed `wrapDocIdSetIterator` to `filterIterator`. The comment 3 is challenging to address. I have already tried to do this in d732d7eb9 as a response to @jimczi feedback, but I reverted this commit because of those challenges. `TopFieldCollector` has a lot of subtle logic that makes it difficult to reason and imitate in other classes. The challenges are following: 1. `HitsThresholdChecker`. First we are passing a not strictly related class `hitsThresholdChecker` to `LeafComparator`. Secondly, it turned out that we can't use `hitsThresholdChecker.isThresholdReached` method in `setBottom` as it starts to return `true` only after we have already collected hits more than `numHits`, but in `setBottom` we need to update an iterator as as soon as we have collected `numHits`, because if there are no competitive docs later `setBottom` will never be called again. 2. `TotalHitsRelation`. If we end up updating the iterator, we need to set it to `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO` and it is not clear to me when this should be set. 3. If we have a parallel collector and would like to update a global bottom, it is not clear to me how to do this with this model as well. I guess I need to think more about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-608442689 @romseygeek Thank you for the review and suggestions, I will work on them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-608059291 @jpountz What do you think of this design in eeb23c11? 1. `IterableFieldComparator` wraps an `FieldComparator` to provide skipping functionality. All numeric comparators are wrapped in corresponding iterable comparators. 2. `SortField` has a new method `allowSkipNonCompetitveDocs`, that if set will use a comparator that provided skipping functionality. In this case, we would not need other classes that I previously introduced `LongDocValuesPointComparator` and `LongDocValuesPointSortField`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606889736 @jpountz Thank you for the review. > I wonder whether we could make it easier to write implementations. I haven't spent much time thinking about it, but for instance would it be possible to wrap existing comparators to add the skipping functionality? Alternatively we could add the skipping logic to the existing comparators, but the fact that Lucene doesn't require that the same data be stored in indexes and doc values makes me a bit nervous about enabling it by default, and I'd like to avoid adding a new constructor argument. Would it make sense for each numeric FieldComparator to add an extra class that would wrap a numeric comparator and provide additional methods for skipping logic (getting an iterator and updating an iterator)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606259561 @msokolov Sorry again for reporting incorrect benchmarking results. Below are are my latest results, and I feel quite confident in their correctness. First about the benchmarking setup. 1. [Here](https://github.com/mayya-sharipova/luceneutil/commit/e0d86b24053cc8a68796abd9f0fd08dbac899779) are the changes made to `luceneutil` 2. `patch` folder is checkout as this PR 3. `trunk` folder is checkout as this PR as well with a modification. As there is no `LongDocValuesPointSortField` in master, I can't benchmark [sorting using this field](https://github.com/mayya-sharipova/luceneutil/commit/e0d86b24053cc8a68796abd9f0fd08dbac899779#diff-58e50bb4a8f0be480df656bcd84d5b77R76) on master. What I did is just is on `trunk` folder delegated sorting to the traditional sorting on a long field like this: ```java public class LongDocValuesPointSortField extends SortField { public LongDocValuesPointSortField(String field) { super(field, SortField.Type.LONG); } public LongDocValuesPointSortField(String field, boolean reverse) { super(field, SortField.Type.LONG, reverse); } } ``` So basically I was benchmarking a traditional long sort VS a long sort using a new field `LongDocValuesPointSortField`. wikimedium10m: 10 millon docs, up to 2x speedups ``` TaskQPS baseline StdDevQPS patch StdDev Pct diff TermDTSort 64.53 (6.4%) 155.29 (42.3%) 140.7% ( 86% - 202%) HighTermDayOfYearSort 47.63 (5.4%) 50.47 (6.8%) 6.0% ( -5% - 19%) HighTermMonthSort 110.07 (7.3%) 121.13 (6.8%) 10.0% ( -3% - 26%) WARNING: cat=TermDTSort: hit counts differ: 754451 vs 1669+ ``` wikimediumall: about 33 million docs, up to 3.5 x speedups ``` TaskQPS baseline StdDevQPS patch StdDev Pct diff TermDTSort 28.96 (4.3%) 108.45 (56.9%) 274.5% ( 204% - 350%) HighTermDayOfYearSort9.69 (5.1%)9.56 (6.1%) -1.3% ( -11% - 10%) HighTermMonthSort 39.41 (4.7%) 47.99 (10.0%) 21.8% ( 6% - 38%) WARNING: cat=TermDTSort: hit counts differ: 1474717 vs 1070+ ``` Please let me know if these results and methodology make sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606042901 @msokolov Thank you for an additional review. I realized I ran benchmarks incorrectly, not indexing documents with docValues. Sorry, I am still learning lucene benchmarking tool. Please disregard the previous benchmarking results, I will be rerunning them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672 @msokolov Thank for suggesting additional benchmarks that we can use. Below are the results on the dataset `wikimedium10m`. First I will repeat the results from the previous round of benchmarking: topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 147.64 | (11.5%) | 547.80 |(6.6%) | | HighTermMonthSort | 147.85 | (12.2%) | 239.28 |(7.3%) | | HighTermDayOfYearSort |74.44 |(7.7%) | 42.56 | (12.1%) | --- topN=10, **taskRepeatCount = 500**, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 184.60 |(8.2%) | 3046.19 |(4.4%) | | HighTermMonthSort | 209.43 |(6.5%) | 253.90 | (10.5%) | | HighTermDayOfYearSort | 130.97 |(5.8%) | 73.25 | (11.8%) | This seemed to speed up all operations, and here the speedups for `TermDTSort` even bigger: 16.5x times. There is also seems to be more regression for `HighTermDayOfYearSort`. --- **topN=500**, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 210.24 |(9.7%) | 537.65 |(6.7%) | | HighTermMonthSort | 116.02 |(8.9%) | 189.96 | (13.5%) | | HighTermDayOfYearSort |42.33 |(7.6%) | 67.93 |(9.3%) | With increased `topN` the sort optimization has less speedups up to 2x, as it is expected as it will be possible to run it only after collecting `topN` docs. --- topN=10, taskRepeatCount = 20, **concurrentSearchers = True** | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 132.09 | (14.3%) | 287.93 | (11.8%) | | HighTermMonthSort | 211.01 | (12.2%) | 116.46 |(7.1%) | | HighTermDayOfYearSort |72.28 |(6.1%) | 68.21 | (11.4%) | With the concurrent searchers the speedups are also smaller up to 2x. This is expected as now segments are spread between several TopFieldCollects/Comparators and they don't exchange bottom values. As a follow-up on this PR, we can think how we can have a global bottom value similar how `MaxScoreAccumulator` is used to set up a global competitive min score. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-604173071 I have run some benchmarking using `luceneutil`. As the new sort optimization uses a new `LongDocValuesPointSortField` that is not present in `luceneutil`, I had to hack `luceneutil` as follows: 1. I added a sort task on a long field `TermDateTimeSort` to `wikimedium.1M.nostopwords.tasks` . This task was present in `wikinightly.tasks` , but was not able for wikimedium 1M and 10M tasks 2. I indexed the corresponding field `lastModNDV` as `LongPoint` as well. It was only indexed as `NumericDocValuesField` before, but for the sort optimization we need long values to be indexed both as docValues and as points. 3. I modified `SearchTask.java` to have `TopFieldCollector` with `totalHitsThreshold` set to `topK`: `final TopFieldCollector c = TopFieldCollector.create(s, topN, null, topN);` Sort optimization only works when we set total hits threshold. 4. For the patch version , I modified sort in `TaskParser.java`. Instead of `lastModNDVSort = new Sort(new SortField("lastModNDV", SortField.Type.LONG));` I useed the optimized sort: `lastModNDVSort = new Sort(new LongDocValuesPointSortField("lastModNDV"));` Here the main point of comparison is `TermDTSort` as it is the only sort on long field. Other sorts are presented to demonstrate a possible regression or absence on them. --- wikimedium1m | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 507.20 | (11.2%) | 550.02 | (16.1%) | | HighTermMonthSort | 550.06 | (10.4%) | 443.69 | (16.1%) | | HighTermDayOfYearSort | 105.62 | (24.9%) | 91.93 | (22.1%) | --- wikimedium10m | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 147.64 | (11.5%) | 547.80 |(6.6%) | | HighTermMonthSort | 147.85 | (12.2%) | 239.28 |(7.3%) | | HighTermDayOfYearSort |74.44 |(7.7%) | 42.56 | (12.1%) | For wikimedium1m using `LongDocValuesPointSortField` doesn't seem to have much effect. As probably in this index segments are smaller, and probably optimization was completely skipped on those segments. For wikimedium10m using `LongDocValuesPointSortField` instead of usual `SortField.Type.LONG` **brings about 3x speedups**. There is so regression/speedups for the sort tasks of HighTermMonthSort and HighTermDayOfYearSort, which I don't know the reason why, as they should not be effected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org