[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573147#comment-17573147
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 7c9d3cd6ff6c6af153ee756a983dc323133f33c7 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7c9d3cd6ff6 ]

LUCENE-10633: Fix handling of missing values in reverse sorts.


> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573148#comment-17573148
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 6366cf2e7ad37dd4f14bb5b7facd3477124073fc in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6366cf2e7ad ]

LUCENE-10633: Fix handling of missing values in reverse sorts.


> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572880#comment-17572880
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 261db55806cd352520e406d5e5a684ce45afa9f4 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=261db55806c ]

LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023)

This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields
by using postings to filter competitive documents.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572873#comment-17572873
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit eb7b7791ba615dfb52d25fb7e542aab539be293e in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eb7b7791ba6 ]

LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023)

This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields
by using postings to filter competitive documents.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-27 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571807#comment-17571807
 ] 

Adrien Grand commented on LUCENE-10633:
---

The PR is ready for review now if someone is interested in having a look. I 
made an improvement for the very sparse case, so that after collecting 
{{numHits}} matches, the collector would tell the query to only look at 
documents that have a value for the sort field.

One assumption that this change makes is that terms are encoded exactly the 
same way in the terms index and in the doc-values terms dictionary. I think 
it's a fine assumption, but wanted to make it explicit because this 
optimization will lead to runtime errors if this assumption isn't met. This is 
the same assumption that we are already making today when sorting numeric 
fields and using the points index to dynamically prune irrelevant hits.

I ran luceneutil again to verify performance is still good:

{noformat}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
HighSloppyPhrase   11.46  (4.3%)   11.19  
(5.3%)   -2.4% ( -11% -7%) 0.120
 Prefix3   53.30 (16.7%)   52.06 
(16.8%)   -2.3% ( -30% -   37%) 0.659
BrowseDateSSDVFacets5.23 (11.1%)5.13 
(13.5%)   -1.9% ( -23% -   25%) 0.632
   BrowseDayOfYearSSDVFacets   20.33  (7.6%)   19.96  
(8.6%)   -1.9% ( -16% -   15%) 0.470
   BrowseMonthTaxoFacets   28.62 (12.0%)   28.11  
(7.8%)   -1.8% ( -19% -   20%) 0.582
OrHighNotLow 1357.76  (6.3%) 1334.12  
(4.8%)   -1.7% ( -12% -9%) 0.325
OrHighNotMed 1568.25  (4.3%) 1541.21  
(4.8%)   -1.7% ( -10% -7%) 0.232
 MedTerm 2422.95  (5.2%) 2381.38  
(4.6%)   -1.7% ( -10% -8%) 0.269
HighTerm 1736.81  (6.5%) 1710.26  
(5.6%)   -1.5% ( -12% -   11%) 0.426
 MedSloppyPhrase   62.45  (3.4%)   61.59  
(4.1%)   -1.4% (  -8% -6%) 0.249
   OrNotHighHigh  931.81  (5.4%)  919.74  
(4.4%)   -1.3% ( -10% -8%) 0.403
  OrHighHigh   58.41  (5.3%)   57.65  
(4.1%)   -1.3% ( -10% -8%) 0.388
OrNotHighMed 1179.51  (3.0%) 1168.53  
(3.2%)   -0.9% (  -6% -5%) 0.338
 BrowseRandomLabelSSDVFacets   14.52  (1.9%)   14.40  
(1.9%)   -0.8% (  -4% -3%) 0.186
 LowTerm 1589.67  (3.6%) 1579.95  
(4.6%)   -0.6% (  -8% -7%) 0.642
MedTermDayTaxoFacets   52.00  (4.3%)   51.70  
(4.3%)   -0.6% (  -8% -8%) 0.672
   OrHighNotHigh 1008.27  (5.9%) 1002.78  
(5.1%)   -0.5% ( -10% -   11%) 0.756
 LowIntervalsOrdered   11.03  (4.8%)   10.98  
(4.4%)   -0.5% (  -9% -9%) 0.724
  OrHighMedDayTaxoFacets   22.72  (3.5%)   22.64  
(3.1%)   -0.4% (  -6% -6%) 0.718
   OrHighLow  899.20  (3.3%)  896.35  
(3.0%)   -0.3% (  -6% -6%) 0.750
 MedIntervalsOrdered   43.37  (3.6%)   43.25  
(3.7%)   -0.3% (  -7% -7%) 0.799
HighIntervalsOrdered   24.44  (5.3%)   24.37  
(5.5%)   -0.3% ( -10% -   11%) 0.864
OrNotHighLow 1448.52  (4.0%) 1446.40  
(3.5%)   -0.1% (  -7% -7%) 0.901
 LowSpanNear   85.70  (2.4%)   85.59  
(2.2%)   -0.1% (  -4% -4%) 0.851
  AndHighLow 1043.29  (5.2%) 1042.26  
(3.9%)   -0.1% (  -8% -9%) 0.946
PKLookup  236.83  (1.4%)  236.69  
(2.2%)   -0.1% (  -3% -3%) 0.919
HighTermTitleBDVSort   25.03  (3.5%)   25.02  
(2.6%)   -0.0% (  -5% -6%) 0.977
Wildcard  156.78  (1.9%)  156.93  
(1.8%)0.1% (  -3% -3%) 0.877
 MedSpanNear  214.11  (4.2%)  214.32  
(2.9%)0.1% (  -6% -7%) 0.929
  Fuzzy1  118.50  (1.2%)  118.67  
(0.9%)0.1% (  -1% -2%) 0.664
 Respell   59.34  (1.0%)   59.43  
(0.8%)0.1% (  -1% -2%) 0.630
  Fuzzy2  115.77  (1.1%)  116.01  
(1.1%)0.2% (  -1% -2%) 0.549
 LowSloppyPhrase   89.17  (2.6%)   89.38  
(2.6%)0.2% (  -4% -5%) 0.771
HighSpanNear   31.18  (4.1%)   31.28  
(3.2%)0.3% (  -6% -8%) 0.769
   

[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-19 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568587#comment-17568587
 ] 

Greg Miller commented on LUCENE-10633:
--

{quote}It also relates to [~gsmiller] 's work about running term-in-set queries 
using doc values, which would only help if doc values are enabled on the field.
{quote}
Which is actually perfect timing as I've just come back to working on this 
(LUCENE-10207) after setting it aside for a while. Thanks for making this 
change to {{luceneutil!}}

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568109#comment-17568109
 ] 

Adrien Grand commented on LUCENE-10633:
---

I opened https://github.com/mikemccand/luceneutil/pull/185.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-18 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567913#comment-17567913
 ] 

Michael McCandless commented on LUCENE-10633:
-

{quote}I plan on opening a PR against luceneutil and I already opened 
LUCENE-10162 a while back about making this sort of things a more obvious 
choice. It also relates to [~gsmiller] 's work about running term-in-set 
queries using doc values, which would only help if doc values are enabled on 
the field.
{quote}
Awesome, thanks [~jpountz]!

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-17 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567649#comment-17567649
 ] 

Adrien Grand commented on LUCENE-10633:
---

Double yes [~mikemccand] ! I plan on opening a PR against luceneutil and I 
already opened LUCENE-10162 a while back about making this sort of things a 
more obvious choice. It also relates to [~gsmiller] 's work about running 
term-in-set queries using doc values, which would only help if doc values are 
enabled on the field.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-17 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567603#comment-17567603
 ] 

Michael McCandless commented on LUCENE-10633:
-

Should we make that change to luceneutil permanent? (indexing sorting fields in 
both points and DVs)?

Maybe we need to make this path more of a default / obvious choice for users so 
they see these optos?  E.g. some sort of combined 
{{{}DocValuesAndPointsField{}}}?

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-17 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567602#comment-17567602
 ] 

Michael McCandless commented on LUCENE-10633:
-

Good grief :)  It is not every day you see a 77X speedup in Lucene queries!!!

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567485#comment-17567485
 ] 

Adrien Grand commented on LUCENE-10633:
---

Indeed the speedup is impressive. :) I should have noted that I had to tweak 
luceneutil to also index fields that were used for sorting so that the inverted 
index could be used to skip hits.

This change is very similar to LUCENE-9280, which led to annotation DD on 
[https://home.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html] and 
https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-15 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567334#comment-17567334
 ] 

Michael Sokolov commented on LUCENE-10633:
--

Adrien that's crazy !

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-15 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567268#comment-17567268
 ] 

Adrien Grand commented on LUCENE-10633:
---

I played with a prototype that starts dynamically pruning matches as soon as 
there are 128 competitive ordinals left or less by pulling postings to iterate 
over the remaining documents that have competitive values. I still need to 
think of simplifying the logic and improving tests but the initial benchmarks 
on wikimedium10m are very encouraging (assuming I didn't get anything wrong):

{noformat}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 Prefix3  248.74  (6.1%)  242.61  
(5.8%)   -2.5% ( -13% -   10%) 0.191
   BrowseMonthTaxoFacets   27.71 (10.1%)   27.34 
(10.6%)   -1.3% ( -20% -   21%) 0.682
BrowseDateSSDVFacets4.99 (10.3%)4.94  
(8.4%)   -1.1% ( -17% -   19%) 0.707
BrowseDateTaxoFacets   44.26 (12.2%)   43.97 
(13.1%)   -0.7% ( -23% -   28%) 0.870
Wildcard  137.61  (3.0%)  136.97  
(2.6%)   -0.5% (  -5% -5%) 0.592
   BrowseDayOfYearTaxoFacets   45.53 (12.4%)   45.44 
(13.4%)   -0.2% ( -23% -   29%) 0.963
  IntNRQ  198.27  (8.1%)  197.94  
(7.4%)   -0.2% ( -14% -   16%) 0.946
 BrowseRandomLabelSSDVFacets   14.51  (2.2%)   14.49  
(2.4%)   -0.2% (  -4% -4%) 0.835
AndHighHighDayTaxoFacets8.32  (5.1%)8.31  
(5.7%)   -0.1% ( -10% -   11%) 0.956
 LowSpanNear   46.83  (1.6%)   46.82  
(2.0%)   -0.0% (  -3% -3%) 0.990
 BrowseRandomLabelTaxoFacets   36.18 (10.5%)   36.18 
(12.6%)0.0% ( -20% -   25%) 0.998
MedTermDayTaxoFacets   73.59  (4.8%)   73.66  
(5.7%)0.1% (  -9% -   11%) 0.954
   OrNotHighHigh 1476.08  (5.3%) 1477.58  
(3.9%)0.1% (  -8% -9%) 0.945
  TermDTSort  746.55  (2.4%)  747.70  
(1.7%)0.2% (  -3% -4%) 0.817
  Fuzzy2   96.18  (1.3%)   96.39  
(1.4%)0.2% (  -2% -2%) 0.617
 AndHighMedDayTaxoFacets  154.89  (1.8%)  155.29  
(1.6%)0.3% (  -3% -3%) 0.629
  AndHighMed  378.38  (3.7%)  379.50  
(4.4%)0.3% (  -7% -8%) 0.817
PKLookup  243.14  (1.9%)  243.99  
(1.9%)0.4% (  -3% -4%) 0.552
  HighPhrase  279.13  (2.1%)  280.21  
(1.5%)0.4% (  -3% -4%) 0.510
 Respell   71.59  (1.5%)   71.87  
(1.5%)0.4% (  -2% -3%) 0.406
  OrHighHigh   66.95  (6.5%)   67.21  
(5.7%)0.4% ( -11% -   13%) 0.837
  Fuzzy1  101.53  (1.5%)  101.95  
(1.5%)0.4% (  -2% -3%) 0.382
   LowPhrase  101.76  (2.3%)  102.22  
(2.6%)0.5% (  -4% -5%) 0.558
 LowSloppyPhrase   21.14  (3.1%)   21.25  
(4.1%)0.5% (  -6% -7%) 0.661
   MedPhrase  173.45  (2.7%)  174.55  
(2.6%)0.6% (  -4% -6%) 0.443
 MedSpanNear   17.77  (4.5%)   17.88  
(4.8%)0.6% (  -8% -   10%) 0.661
OrHighNotLow 1396.26  (5.6%) 1406.85  
(6.4%)0.8% ( -10% -   13%) 0.692
   OrHighMed  162.41  (5.3%)  163.69  
(4.8%)0.8% (  -8% -   11%) 0.625
   HighTermDayOfYearSort 1476.11  (2.7%) 1488.26  
(2.4%)0.8% (  -4% -6%) 0.312
 MedIntervalsOrdered  113.65  (4.2%)  114.59  
(7.0%)0.8% (  -9% -   12%) 0.652
   OrHighLow  828.13  (5.2%)  835.45  
(4.7%)0.9% (  -8% -   11%) 0.574
 MedTerm 2356.21  (4.7%) 2377.47  
(5.0%)0.9% (  -8% -   11%) 0.554
 MedSloppyPhrase   62.13  (3.4%)   62.72  
(3.9%)0.9% (  -6% -8%) 0.420
HighIntervalsOrdered   18.19  (5.7%)   18.37  
(8.6%)1.0% ( -12% -   16%) 0.673
 AndHighHigh   54.46  (6.2%)   55.01  
(6.3%)1.0% ( -10% -   14%) 0.615
 LowTerm 2247.13  (4.7%) 2270.19  
(3.7%)1.0% (  -7% -9%) 0.446
OrNotHighLow 1728.71  (4.3%) 1748.19  
(4.7%)1.1% (  -7% -   10%) 0.427
HighTermTitleBDVSort   14.31  (3.3%)   14.47  
(5.7%)