[ https://issues.apache.org/jira/browse/LUCENE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966516#comment-14966516 ]
Adrien Grand edited comment on LUCENE-6850 at 10/21/15 9:20 AM: ---------------------------------------------------------------- I iterated on the previous patch in order to also optimize the case when all clauses return a non-null BulkScorer, but some windows of 2048 documents only contain matches for one of the sub scorers: in that case we can call the collector directly instead of going through a bitset and replaying. luceneutil on wikimedium10m shows a nice speedup for {{OrHighLow}}: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff Fuzzy2 54.93 (13.3%) 51.19 (16.9%) -6.8% ( -32% - 26%) OrHighHigh 37.94 (9.1%) 35.76 (7.0%) -5.7% ( -20% - 11%) OrHighMed 76.23 (9.0%) 73.41 (6.3%) -3.7% ( -17% - 12%) OrNotHighLow 1684.73 (4.6%) 1648.87 (6.6%) -2.1% ( -12% - 9%) IntNRQ 13.63 (4.0%) 13.49 (4.8%) -1.0% ( -9% - 8%) AndHighLow 731.68 (2.6%) 726.44 (3.6%) -0.7% ( -6% - 5%) Respell 61.24 (3.0%) 60.84 (3.7%) -0.7% ( -7% - 6%) HighSpanNear 22.89 (3.7%) 22.82 (4.0%) -0.3% ( -7% - 7%) HighTerm 136.93 (2.8%) 136.57 (3.1%) -0.3% ( -5% - 5%) MedSpanNear 72.54 (3.1%) 72.36 (3.5%) -0.2% ( -6% - 6%) MedPhrase 30.70 (1.9%) 30.63 (1.8%) -0.2% ( -3% - 3%) HighPhrase 35.13 (3.8%) 35.12 (3.5%) -0.1% ( -7% - 7%) MedTerm 184.28 (3.1%) 184.23 (2.5%) -0.0% ( -5% - 5%) AndHighHigh 16.74 (1.4%) 16.76 (1.4%) 0.1% ( -2% - 2%) LowSpanNear 39.03 (1.8%) 39.08 (2.3%) 0.1% ( -3% - 4%) Wildcard 43.57 (2.6%) 43.66 (2.9%) 0.2% ( -5% - 5%) AndHighMed 178.28 (1.5%) 178.78 (2.1%) 0.3% ( -3% - 3%) OrHighNotMed 71.53 (4.7%) 71.79 (2.8%) 0.4% ( -6% - 8%) OrNotHighMed 79.22 (2.6%) 79.65 (2.0%) 0.5% ( -3% - 5%) OrNotHighHigh 61.27 (3.0%) 61.61 (2.0%) 0.6% ( -4% - 5%) LowTerm 818.90 (5.9%) 823.47 (4.3%) 0.6% ( -9% - 11%) Prefix3 176.52 (2.9%) 177.57 (3.2%) 0.6% ( -5% - 6%) LowPhrase 380.46 (3.4%) 383.13 (3.4%) 0.7% ( -5% - 7%) MedSloppyPhrase 155.97 (3.5%) 157.16 (2.8%) 0.8% ( -5% - 7%) OrHighNotHigh 45.73 (3.1%) 46.09 (1.9%) 0.8% ( -4% - 5%) LowSloppyPhrase 65.95 (2.0%) 66.59 (1.6%) 1.0% ( -2% - 4%) OrHighNotLow 97.93 (4.8%) 99.02 (2.4%) 1.1% ( -5% - 8%) Fuzzy1 49.26 (6.6%) 50.06 (6.7%) 1.6% ( -10% - 16%) HighSloppyPhrase 24.74 (4.2%) 25.65 (5.8%) 3.7% ( -6% - 14%) OrHighLow 84.42 (7.7%) 107.15 (8.4%) 26.9% ( 10% - 46%) {noformat} was (Author: jpountz): I iterated on the previous patch in order to also optimize the case when all clauses return a non-null BulkScorer, but some windows of 2048 documents only contain matches for one of the sub scorers: in that case we can call the collector directly instead of going through a bitset and replaying. luceneutil on wikimedium10 shows a nice speedup for {{OrHighLow}}: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff Fuzzy2 54.93 (13.3%) 51.19 (16.9%) -6.8% ( -32% - 26%) OrHighHigh 37.94 (9.1%) 35.76 (7.0%) -5.7% ( -20% - 11%) OrHighMed 76.23 (9.0%) 73.41 (6.3%) -3.7% ( -17% - 12%) OrNotHighLow 1684.73 (4.6%) 1648.87 (6.6%) -2.1% ( -12% - 9%) IntNRQ 13.63 (4.0%) 13.49 (4.8%) -1.0% ( -9% - 8%) AndHighLow 731.68 (2.6%) 726.44 (3.6%) -0.7% ( -6% - 5%) Respell 61.24 (3.0%) 60.84 (3.7%) -0.7% ( -7% - 6%) HighSpanNear 22.89 (3.7%) 22.82 (4.0%) -0.3% ( -7% - 7%) HighTerm 136.93 (2.8%) 136.57 (3.1%) -0.3% ( -5% - 5%) MedSpanNear 72.54 (3.1%) 72.36 (3.5%) -0.2% ( -6% - 6%) MedPhrase 30.70 (1.9%) 30.63 (1.8%) -0.2% ( -3% - 3%) HighPhrase 35.13 (3.8%) 35.12 (3.5%) -0.1% ( -7% - 7%) MedTerm 184.28 (3.1%) 184.23 (2.5%) -0.0% ( -5% - 5%) AndHighHigh 16.74 (1.4%) 16.76 (1.4%) 0.1% ( -2% - 2%) LowSpanNear 39.03 (1.8%) 39.08 (2.3%) 0.1% ( -3% - 4%) Wildcard 43.57 (2.6%) 43.66 (2.9%) 0.2% ( -5% - 5%) AndHighMed 178.28 (1.5%) 178.78 (2.1%) 0.3% ( -3% - 3%) OrHighNotMed 71.53 (4.7%) 71.79 (2.8%) 0.4% ( -6% - 8%) OrNotHighMed 79.22 (2.6%) 79.65 (2.0%) 0.5% ( -3% - 5%) OrNotHighHigh 61.27 (3.0%) 61.61 (2.0%) 0.6% ( -4% - 5%) LowTerm 818.90 (5.9%) 823.47 (4.3%) 0.6% ( -9% - 11%) Prefix3 176.52 (2.9%) 177.57 (3.2%) 0.6% ( -5% - 6%) LowPhrase 380.46 (3.4%) 383.13 (3.4%) 0.7% ( -5% - 7%) MedSloppyPhrase 155.97 (3.5%) 157.16 (2.8%) 0.8% ( -5% - 7%) OrHighNotHigh 45.73 (3.1%) 46.09 (1.9%) 0.8% ( -4% - 5%) LowSloppyPhrase 65.95 (2.0%) 66.59 (1.6%) 1.0% ( -2% - 4%) OrHighNotLow 97.93 (4.8%) 99.02 (2.4%) 1.1% ( -5% - 8%) Fuzzy1 49.26 (6.6%) 50.06 (6.7%) 1.6% ( -10% - 16%) HighSloppyPhrase 24.74 (4.2%) 25.65 (5.8%) 3.7% ( -6% - 14%) OrHighLow 84.42 (7.7%) 107.15 (8.4%) 26.9% ( 10% - 46%) {noformat} > BooleanWeight should not use BS1 when there is a single non-null clause > ----------------------------------------------------------------------- > > Key: LUCENE-6850 > URL: https://issues.apache.org/jira/browse/LUCENE-6850 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Attachments: LUCENE-6850.patch, LUCENE-6850.patch > > > When a disjunction has a single non-null scorer, we still use BS1 for > bulk-scoring, which first collects matches into a bit set and then calls the > collector. This is inefficient: we should just call the inner bulk scorer > directly and wrap the scorer to apply the coord factor (like > BooleanTopLevelScorers.BoostedScorer does). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org