[jira] [Comment Edited] (LUCENE-6850) BooleanWeight should not use BS1 when there is a single non-null clause

Adrien Grand (JIRA) Wed, 21 Oct 2015 02:21:03 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966516#comment-14966516
 ]


Adrien Grand edited comment on LUCENE-6850 at 10/21/15 9:20 AM:
----------------------------------------------------------------

I iterated on the previous patch in order to also optimize the case when all 
clauses return a non-null BulkScorer, but some windows of 2048 documents only 
contain matches for one of the sub scorers: in that case we can call the 
collector directly instead of going through a bitset and replaying. luceneutil 
on wikimedium10m shows a nice speedup for {{OrHighLow}}:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  Fuzzy2       54.93     (13.3%)       51.19     (16.9%)   
-6.8% ( -32% -   26%)
              OrHighHigh       37.94      (9.1%)       35.76      (7.0%)   
-5.7% ( -20% -   11%)
               OrHighMed       76.23      (9.0%)       73.41      (6.3%)   
-3.7% ( -17% -   12%)
            OrNotHighLow     1684.73      (4.6%)     1648.87      (6.6%)   
-2.1% ( -12% -    9%)
                  IntNRQ       13.63      (4.0%)       13.49      (4.8%)   
-1.0% (  -9% -    8%)
              AndHighLow      731.68      (2.6%)      726.44      (3.6%)   
-0.7% (  -6% -    5%)
                 Respell       61.24      (3.0%)       60.84      (3.7%)   
-0.7% (  -7% -    6%)
            HighSpanNear       22.89      (3.7%)       22.82      (4.0%)   
-0.3% (  -7% -    7%)
                HighTerm      136.93      (2.8%)      136.57      (3.1%)   
-0.3% (  -5% -    5%)
             MedSpanNear       72.54      (3.1%)       72.36      (3.5%)   
-0.2% (  -6% -    6%)
               MedPhrase       30.70      (1.9%)       30.63      (1.8%)   
-0.2% (  -3% -    3%)
              HighPhrase       35.13      (3.8%)       35.12      (3.5%)   
-0.1% (  -7% -    7%)
                 MedTerm      184.28      (3.1%)      184.23      (2.5%)   
-0.0% (  -5% -    5%)
             AndHighHigh       16.74      (1.4%)       16.76      (1.4%)    
0.1% (  -2% -    2%)
             LowSpanNear       39.03      (1.8%)       39.08      (2.3%)    
0.1% (  -3% -    4%)
                Wildcard       43.57      (2.6%)       43.66      (2.9%)    
0.2% (  -5% -    5%)
              AndHighMed      178.28      (1.5%)      178.78      (2.1%)    
0.3% (  -3% -    3%)
            OrHighNotMed       71.53      (4.7%)       71.79      (2.8%)    
0.4% (  -6% -    8%)
            OrNotHighMed       79.22      (2.6%)       79.65      (2.0%)    
0.5% (  -3% -    5%)
           OrNotHighHigh       61.27      (3.0%)       61.61      (2.0%)    
0.6% (  -4% -    5%)
                 LowTerm      818.90      (5.9%)      823.47      (4.3%)    
0.6% (  -9% -   11%)
                 Prefix3      176.52      (2.9%)      177.57      (3.2%)    
0.6% (  -5% -    6%)
               LowPhrase      380.46      (3.4%)      383.13      (3.4%)    
0.7% (  -5% -    7%)
         MedSloppyPhrase      155.97      (3.5%)      157.16      (2.8%)    
0.8% (  -5% -    7%)
           OrHighNotHigh       45.73      (3.1%)       46.09      (1.9%)    
0.8% (  -4% -    5%)
         LowSloppyPhrase       65.95      (2.0%)       66.59      (1.6%)    
1.0% (  -2% -    4%)
            OrHighNotLow       97.93      (4.8%)       99.02      (2.4%)    
1.1% (  -5% -    8%)
                  Fuzzy1       49.26      (6.6%)       50.06      (6.7%)    
1.6% ( -10% -   16%)
        HighSloppyPhrase       24.74      (4.2%)       25.65      (5.8%)    
3.7% (  -6% -   14%)
               OrHighLow       84.42      (7.7%)      107.15      (8.4%)   
26.9% (  10% -   46%)
{noformat}


was (Author: jpountz):
I iterated on the previous patch in order to also optimize the case when all 
clauses return a non-null BulkScorer, but some windows of 2048 documents only 
contain matches for one of the sub scorers: in that case we can call the 
collector directly instead of going through a bitset and replaying. luceneutil 
on wikimedium10 shows a nice speedup for {{OrHighLow}}:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  Fuzzy2       54.93     (13.3%)       51.19     (16.9%)   
-6.8% ( -32% -   26%)
              OrHighHigh       37.94      (9.1%)       35.76      (7.0%)   
-5.7% ( -20% -   11%)
               OrHighMed       76.23      (9.0%)       73.41      (6.3%)   
-3.7% ( -17% -   12%)
            OrNotHighLow     1684.73      (4.6%)     1648.87      (6.6%)   
-2.1% ( -12% -    9%)
                  IntNRQ       13.63      (4.0%)       13.49      (4.8%)   
-1.0% (  -9% -    8%)
              AndHighLow      731.68      (2.6%)      726.44      (3.6%)   
-0.7% (  -6% -    5%)
                 Respell       61.24      (3.0%)       60.84      (3.7%)   
-0.7% (  -7% -    6%)
            HighSpanNear       22.89      (3.7%)       22.82      (4.0%)   
-0.3% (  -7% -    7%)
                HighTerm      136.93      (2.8%)      136.57      (3.1%)   
-0.3% (  -5% -    5%)
             MedSpanNear       72.54      (3.1%)       72.36      (3.5%)   
-0.2% (  -6% -    6%)
               MedPhrase       30.70      (1.9%)       30.63      (1.8%)   
-0.2% (  -3% -    3%)
              HighPhrase       35.13      (3.8%)       35.12      (3.5%)   
-0.1% (  -7% -    7%)
                 MedTerm      184.28      (3.1%)      184.23      (2.5%)   
-0.0% (  -5% -    5%)
             AndHighHigh       16.74      (1.4%)       16.76      (1.4%)    
0.1% (  -2% -    2%)
             LowSpanNear       39.03      (1.8%)       39.08      (2.3%)    
0.1% (  -3% -    4%)
                Wildcard       43.57      (2.6%)       43.66      (2.9%)    
0.2% (  -5% -    5%)
              AndHighMed      178.28      (1.5%)      178.78      (2.1%)    
0.3% (  -3% -    3%)
            OrHighNotMed       71.53      (4.7%)       71.79      (2.8%)    
0.4% (  -6% -    8%)
            OrNotHighMed       79.22      (2.6%)       79.65      (2.0%)    
0.5% (  -3% -    5%)
           OrNotHighHigh       61.27      (3.0%)       61.61      (2.0%)    
0.6% (  -4% -    5%)
                 LowTerm      818.90      (5.9%)      823.47      (4.3%)    
0.6% (  -9% -   11%)
                 Prefix3      176.52      (2.9%)      177.57      (3.2%)    
0.6% (  -5% -    6%)
               LowPhrase      380.46      (3.4%)      383.13      (3.4%)    
0.7% (  -5% -    7%)
         MedSloppyPhrase      155.97      (3.5%)      157.16      (2.8%)    
0.8% (  -5% -    7%)
           OrHighNotHigh       45.73      (3.1%)       46.09      (1.9%)    
0.8% (  -4% -    5%)
         LowSloppyPhrase       65.95      (2.0%)       66.59      (1.6%)    
1.0% (  -2% -    4%)
            OrHighNotLow       97.93      (4.8%)       99.02      (2.4%)    
1.1% (  -5% -    8%)
                  Fuzzy1       49.26      (6.6%)       50.06      (6.7%)    
1.6% ( -10% -   16%)
        HighSloppyPhrase       24.74      (4.2%)       25.65      (5.8%)    
3.7% (  -6% -   14%)
               OrHighLow       84.42      (7.7%)      107.15      (8.4%)   
26.9% (  10% -   46%)
{noformat}

> BooleanWeight should not use BS1 when there is a single non-null clause
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-6850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6850
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6850.patch, LUCENE-6850.patch
>
>
> When a disjunction has a single non-null scorer, we still use BS1 for 
> bulk-scoring, which first collects matches into a bit set and then calls the 
> collector. This is inefficient: we should just call the inner bulk scorer 
> directly and wrap the scorer to apply the coord factor (like 
> BooleanTopLevelScorers.BoostedScorer does).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6850) BooleanWeight should not use BS1 when there is a single non-null clause

Reply via email to