[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-21 Thread GitBox


mayya-sharipova commented on issue #1351:
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-617422262


   @mikemccand Thanks for looking at this
   
   > Do you know why you are seeing these warnings?
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541658 vs 541658+
   WARNING: cat=TermDTSort: hit counts differ: 68644 vs 68644+
   ... the optimization did not wind up skipping any hits (though, it thought 
it may have, hence the added +)
   
   Indeed, we set up `totalHitsRelation` to `GREATER_THAN_OR_EQUAL_TO` when we 
try to run the optimization, but the optimization may end up  not skipping any 
documents if it is not selective enough.  It looks like some Scorers behave the 
same way in updating competitive scores (e.g 
LongDistanceFeatureQuery#DistanceScorer).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-17 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615300443
 
 
   I have caught up with @jimczi  offline, and it could be how selective a 
query iterator is important for performance. It is possible that if  a query 
iterator is already selective enough there is no point to materialize a 
collector's iterator based on points.  
   I am going to run benchmarks on MatchAll query to investigate that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-17 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615280419
 
 
   @romseygeek  Are you suggesting to do 
   ```java
   if (updateCounter > 1024 && (updateCounter & 0x1f) != 0x1f) {
   ```
   but this will run optimization even more often which we want to avoid, no?
   
   It the wikimedium1m TermDTSort  case,  `updateCounter` doesn't even reach 20 
(so the optimization doesn't called that many times), but enough to make it 
slower than the traditional sort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-17 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615263686
 
 
   I thought I also report benchmarking results if we apply the optimization 
only on segments over 1 million docs .  Here we don't have any significant 
reductions, but also able to achieve speedups.
   
   wikimedium1m
   ```
 TaskQPS baseline  StdDevQPS my_modified_version  StdDev
Pct diff
 TermDTSort  395.99  (8.0%)  360.72 (11.5%)   
-8.9% ( -26% -   11%)
  HighTermDayOfYearSort   49.51 (19.8%)   51.95 (14.0%)
4.9% ( -24% -   48%)
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541658 vs 541658+
   WARNING: cat=TermDTSort: hit counts differ: 68644 vs 68644+
   ```
   
   wikimedium10m
   ```
 TaskQPS baseline  StdDevQPS my_modified_version  StdDev
Pct diff
 TermDTSort   83.37  (5.1%)  111.73 (30.4%)   
34.0% (  -1% -   73%)
  HighTermDayOfYearSort   52.46  (6.9%)   46.76 (12.4%)  
-10.9% ( -28% -9%)
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 496079 vs 496079+
   WARNING: cat=TermDTSort: hit counts differ: 506054 vs 44560+
   ```
   
   wikimediumall
   ```
 TaskQPS baseline  StdDevQPS my_modified_version  StdDev
Pct diff
 TermDTSort   32.23  (3.2%)   85.28 (26.2%)  
164.6% ( 131% -  200%)
  HighTermDayOfYearSort   14.46  (5.0%)   13.93  (6.6%)   
-3.7% ( -14% -8%)
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 2485178 vs 2485178+
   WARNING: cat=TermDTSort: hit counts differ: 1474717 vs 106400+
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-17 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-615261001
 
 
   Sorry for bringing this up and not finishing, but I thought that is also 
worth to report the test results on a smaller collection `wikimedium1m`:
   
   ```
TaskQPS baseline   StdDevQPS patch StdDev
Pct diff
 TermDTSort  292.71 (15.1%)   59.60  (4.9%)  
-79.6% ( -86% -  -70%)
  HighTermDayOfYearSort   60.01 (44.0%)   33.75 (13.6%)  
-43.8% ( -70% -   24%)
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 65216 vs 65093+
   WARNING: cat=TermDTSort: hit counts differ: 68644 vs 507+
   ```
   
   Here there is a substantial reduction in performance by using the proposed 
sort optimization.  
   
   As the data in these indexes are not monotonically increasing `setBottom` is 
called many times.  
   Looks like for smaller indexes (especially with data that is not 
monotonically increasing) it is faster just to do the conventional sort than 
the proposed optimization.  
   
   I am not sure how significant is this reduction. 
   - **Should we apply the optimization only for segments over 1 million docs?**
   - **Should we apply the optimization only when the data is diverse enough?**
   
   Or we can follow up on these proposals in subsequent PRs?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-16 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-614988778
 
 
   @msokolov  @jimczi @jpountz  I was wondering if you have any other 
additional comments for this change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-16 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-614931785
 
 
   I have run another round of benchmarks, this time comparing the performance 
of this PR VS master as we don't need any special sort field.  
[Here](https://github.com/mayya-sharipova/luceneutil/commit/c3166e4fc44e7fcddcd1672112c96364d9f464e5)
 are the changes made to luceneutil.
   
   
   **wikimedium10m**
   ```
TaskQPS baseline   StdDevQPS patch StdDev
Pct diff
  HighTermDayOfYearSort   50.93  (5.6%)   49.31 (10.9%)   
-3.2% ( -18% -   14%)
 TermDTSort   83.37  (5.9%)  129.95 (41.2%)   
55.9% (   8% -  109%)
   WARNING: cat=HighTermDayOfYearSort: hit counts differ: 541957 vs 541957+
   WARNING: cat=TermDTSort: hit counts differ: 506054 vs 1861+
   ```
   Here we have two sorts:
   -  Int sort on a day of year. Slight decrease of performance: -3.2%. There 
was an attempt to do the optimization, but the optimization was eventually not 
run as every time 
[estimatedNumberOfMatches](https://github.com/apache/lucene-solr/pull/1351/files#diff-aff67e212aa0edd675ec31c068cb642bR268)
 was not selective enough. The reason for that the data here a day of the year 
in the range [1, 366], and all segments contain various values through a 
segment.
   
   - Long sort on date field (msecSinceEpoch).  Speedups: 55.9%.   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-08 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-611260306
 
 
   @romseygeek  I have tried to address your outstanding feedback in 
4448499f0f.  Can you please continue the review when you have time?
   
   > Move the logic that checks whether or not to update the iterator into 
setBottom on the leaf comparator.
   
   In the new `FilteringFieldComparator` class, the iterator is updated in
   - setBottom
   - when we change a segment in `getLeafComparator`, so that we can also 
update iterators of subsequent segments. 
   - and also when for the first time queue becomes full and hitsThreshold is 
reached in `setCanUpdateIterator`, this method is called from a collector.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-06 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-610078508
 
 
   @romseygeek Thanks for the feedback. I have addressed your comments 1 and 2 
in 89d241e.   Indeed, the APIs look simpler, I like them more now.  I just 
renamed `wrapDocIdSetIterator` to `filterIterator`. 
   
   The comment 3 is challenging to address.  I have already tried to do this in 
d732d7eb9 as a response to @jimczi  feedback, but I reverted this commit 
because of those challenges.  `TopFieldCollector` has a lot of subtle logic 
that makes it difficult to reason and imitate in other classes.  The challenges 
are following:
   
   1. `HitsThresholdChecker`. First we are passing a not strictly related class 
`hitsThresholdChecker` to `LeafComparator`.  Secondly, it turned out that we 
can't use `hitsThresholdChecker.isThresholdReached` method in `setBottom` as it 
starts to return `true` only after we have already collected  hits  more than 
`numHits`, but in `setBottom` we need to update an iterator as as soon as we 
have collected `numHits`, because if there are no competitive docs later 
`setBottom` will never be called again.
   
   2. `TotalHitsRelation`.  If we end up updating the iterator, we need to set 
it to `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO` and it is not clear to me 
when this should be set.
   
   3. If we have a parallel collector and would like to update a global bottom, 
it is not clear to me how to do this with this model as well.
   
   I guess I need to think more about it. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-03 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-608442689
 
 
   @romseygeek Thank you for the review and suggestions, I will work on them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-02 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-608059291
 
 
   @jpountz  What do you think of this design in eeb23c11?
   
   1. `IterableFieldComparator` wraps an `FieldComparator` to provide skipping 
functionality. All numeric  comparators are wrapped in corresponding iterable 
comparators.
   2.  `SortField` has a new method  `allowSkipNonCompetitveDocs`, that if set 
will use a comparator that provided skipping functionality.
   
   In this case, we would not need other classes that I previously introduced 
`LongDocValuesPointComparator` and  `LongDocValuesPointSortField`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-31 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606889736
 
 
   @jpountz  Thank you for the review.
   
   > I wonder whether we could make it easier to write implementations. I 
haven't spent much time thinking about it, but for instance would it be 
possible to wrap existing comparators to add the skipping functionality? 
Alternatively we could add the skipping logic to the existing comparators, but 
the fact that Lucene doesn't require that the same data be stored in indexes 
and doc values makes me a bit nervous about enabling it by default, and I'd 
like to avoid adding a new constructor argument.
   
   Would it make sense  for each numeric FieldComparator to add an extra class 
that would wrap a numeric comparator and provide additional methods for 
skipping logic (getting an iterator and updating an iterator)? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-30 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606259561
 
 
   @msokolov  Sorry again for reporting incorrect benchmarking results. Below 
are are my latest results, and I feel  quite confident in their correctness.
   
   First about the benchmarking setup.
   1. 
[Here](https://github.com/mayya-sharipova/luceneutil/commit/e0d86b24053cc8a68796abd9f0fd08dbac899779)
  are the changes made to `luceneutil`
   2.  `patch` folder is checkout as this PR
   3. `trunk` folder is checkout as this PR as well with a modification.  As 
there is no `LongDocValuesPointSortField` in master, I can't benchmark [sorting 
using this 
field](https://github.com/mayya-sharipova/luceneutil/commit/e0d86b24053cc8a68796abd9f0fd08dbac899779#diff-58e50bb4a8f0be480df656bcd84d5b77R76)
 on master. What I did is just is on `trunk` 
   folder delegated sorting to the traditional sorting on a long field like 
this:
   ```java
   public class LongDocValuesPointSortField extends SortField {
   public LongDocValuesPointSortField(String field) {
   super(field, SortField.Type.LONG);
   }
   public LongDocValuesPointSortField(String field, boolean reverse) {
   super(field, SortField.Type.LONG, reverse);
   }
   }
   ```
   So basically I was benchmarking a traditional long sort VS a long sort using 
a new field `LongDocValuesPointSortField`.
   
   
   wikimedium10m: 10 millon docs, up to 2x speedups
   
   ```
TaskQPS baseline   StdDevQPS patch StdDev
Pct diff
TermDTSort   64.53  (6.4%)  155.29 (42.3%)  
140.7% (  86% -  202%)
 HighTermDayOfYearSort   47.63  (5.4%)   50.47  (6.8%)
6.0% (  -5% -   19%)
  HighTermMonthSort  110.07 (7.3%)  121.13  (6.8%)   
10.0% (  -3% -   26%)
   WARNING: cat=TermDTSort: hit counts differ: 754451 vs 1669+
   ```
   
   wikimediumall: about 33 million docs, up to 3.5 x speedups
   ```
TaskQPS baseline   StdDevQPS patch StdDev
Pct diff
 TermDTSort   28.96  (4.3%)  108.45 (56.9%)  
274.5% ( 204% -  350%)
  HighTermDayOfYearSort9.69  (5.1%)9.56  (6.1%)   
-1.3% ( -11% -   10%)
  HighTermMonthSort   39.41  (4.7%)   47.99 (10.0%)   
21.8% (   6% -   38%)
   WARNING: cat=TermDTSort: hit counts differ: 1474717 vs 1070+
   ```
   
   Please let me know if these results and methodology make sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-30 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-606042901
 
 
   @msokolov Thank you for an additional  review.  I realized I ran benchmarks 
incorrectly, not indexing documents with docValues. Sorry, I am still learning 
lucene benchmarking tool.  Please disregard the previous benchmarking results, 
I will be rerunning them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-27 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672
 
 
   @msokolov Thank for suggesting additional benchmarks that we can use.
   Below are the results on the dataset `wikimedium10m`.
   
   First I will repeat the results from the previous round of benchmarking:
   
   topN=10, taskRepeatCount = 20, concurrentSearchers = False
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   147.64 |   (11.5%) |  547.80 
|(6.6%) |
   | HighTermMonthSort |   147.85 |   (12.2%) |  239.28 
|(7.3%) |
   | HighTermDayOfYearSort |74.44 |(7.7%) |   42.56 
|   (12.1%) |
   
   
   
   ---
 topN=10, **taskRepeatCount = 500**, concurrentSearchers = False
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   184.60 |(8.2%) | 3046.19 
|(4.4%) |
   | HighTermMonthSort |   209.43 |(6.5%) |  253.90 
|   (10.5%) |
   | HighTermDayOfYearSort |   130.97 |(5.8%) |   73.25 
|   (11.8%) |
   
   This seemed to speed up all operations, and here the speedups for 
`TermDTSort` even bigger: 16.5x times. There is also seems to be more 
regression for `HighTermDayOfYearSort`.
   
   ---
 **topN=500**,  taskRepeatCount = 20, concurrentSearchers = False
   
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   210.24 |(9.7%) |  537.65 
|(6.7%) |
   | HighTermMonthSort |   116.02 |(8.9%) |  189.96 
|   (13.5%) |
   | HighTermDayOfYearSort |42.33 |(7.6%) |   67.93 
|(9.3%) |
   
   With increased `topN` the sort optimization has less speedups up to 2x, as 
it is expected as it will be possible to run it only after collecting `topN` 
docs.
   
   ---
   topN=10, taskRepeatCount = 20, **concurrentSearchers = True**
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   132.09 |   (14.3%) |  287.93 
|   (11.8%) |
   | HighTermMonthSort |   211.01 |   (12.2%) |  116.46 
|(7.1%) |
   | HighTermDayOfYearSort |72.28 |(6.1%) |   68.21 
|   (11.4%) |
   
   With the concurrent searchers the speedups are also smaller up to 2x. This 
is expected as now segments are spread between several 
TopFieldCollects/Comparators and they don't exchange bottom values.  As a 
follow-up on this PR, we can think how we can have a global bottom value 
similar how `MaxScoreAccumulator` is used to set up a global competitive min 
score. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-25 Thread GitBox
mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-604173071
 
 
   I have run some benchmarking using `luceneutil`.
   As the new sort optimization uses a new `LongDocValuesPointSortField` that 
is not present in `luceneutil`, I had to hack `luceneutil` as follows:
   
   1. I added a  sort task on a long field `TermDateTimeSort`  to 
`wikimedium.1M.nostopwords.tasks` . This task was present in 
`wikinightly.tasks` , but was not able for wikimedium 1M and 10M tasks
   2. I indexed the corresponding field `lastModNDV` as `LongPoint` as well. It 
was only indexed as `NumericDocValuesField` before, but for the sort 
optimization we need long values to be indexed both as docValues and as points.
   3. I modified `SearchTask.java` to have `TopFieldCollector` with 
`totalHitsThreshold` set to `topK`: `final TopFieldCollector c = 
TopFieldCollector.create(s, topN, null, topN);`   Sort optimization only works 
when we set total hits threshold.
   4. For the patch version , I modified sort in `TaskParser.java`. Instead of 
`lastModNDVSort = new Sort(new SortField("lastModNDV", SortField.Type.LONG));`  
I useed the optimized sort: `lastModNDVSort = new Sort(new 
LongDocValuesPointSortField("lastModNDV"));`
   
   Here the main point of comparison is `TermDTSort` as it is the only sort on 
long field. Other sorts are presented to demonstrate a possible regression or 
absence on them.
   
   ---
   wikimedium1m
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   507.20 |   (11.2%) |  550.02 
|   (16.1%) |
   | HighTermMonthSort |   550.06 |   (10.4%) |  443.69 
|   (16.1%) |
   | HighTermDayOfYearSort |   105.62 |   (24.9%) |   91.93 
|   (22.1%) |
   ---
   wikimedium10m
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   147.64 |   (11.5%) |  547.80 
|(6.6%) |
   | HighTermMonthSort |   147.85 |   (12.2%) |  239.28 
|(7.3%) |
   | HighTermDayOfYearSort |74.44 |(7.7%) |   42.56 
|   (12.1%) |
   
   For wikimedium1m using `LongDocValuesPointSortField` doesn't seem to have 
much effect. As probably in this index segments are smaller, and probably 
optimization was completely skipped on those segments.
   For wikimedium10m using `LongDocValuesPointSortField`  instead of usual 
`SortField.Type.LONG` **brings about 3x speedups**.
   There is so regression/speedups for the sort tasks of HighTermMonthSort and 
HighTermDayOfYearSort, which I don't know the reason why, as they should not be 
effected. 
   
   
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org