[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries
[ https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069728#comment-13069728 ] Simon Willnauer edited comment on LUCENE-3328 at 7/22/11 8:38 PM: -- bq. I think advancing the lead in program code at the place of the break comment could fix this. Paul this works and looks as expected. Once we break to the advanceHead label we step out the inner do {} while; and advance the head. Maybe I don't understand your comment correctly? There is certainly space for improvement here. for instance could the head be advanced to the doc we break on but the advance call there actually yields a perf hit. Yet, we can play some tricks like if (DF / maxdoc X) enum.advance(n) else while(n enum.nextDoc()); which I think I'll look into after vacation :) was (Author: simonw): bq. I think advancing the lead in program code at the place of the break comment could fix this. Paul this works and looks as expected. Once we break to the advanceHead label we step out the inner do {} while; and advance the head. Maybe I don't understand your comment correctly? There is certainly space for improvement here. for instance could the head be advanced to the doc we break on but the advance call there actually yields a perf hit. Yet, we can play some tricks like if (DF / maxdoc X) enum.advance(x) else while(x enum.nextDoc()); which I think I'll look into after vacation :) Specialize BooleanQuery if all clauses are TermQueries -- Key: LUCENE-3328 URL: https://issues.apache.org/jira/browse/LUCENE-3328 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch During work on LUCENE-3319 I ran into issues with BooleanQuery compared to PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass the position matching, essentially doing a conjunction match, ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. I converted a ConjunctionScorer to use DocsEnum directly but still didn't get all the 40% from PhraseQuery. Yet, it turned out with further optimizations this gets very close to PhraseQuery. The biggest gain here came from converting the hand crafted loop in ConjunctionScorer#doNext to a for loop which seems to be less confusing to hotspot. In this particular case I think code specialization makes lots of sense since BQ with TQ is by far one of the most common queries. I will upload a patch shortly -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries
[ https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069728#comment-13069728 ] Simon Willnauer edited comment on LUCENE-3328 at 7/22/11 8:39 PM: -- bq. I think advancing the lead in program code at the place of the break comment could fix this. Paul this works and looks as expected. Once we break to the advanceHead label we step out the inner do {} while; and advance the head. Maybe I don't understand your comment correctly? There is certainly space for improvement here. for instance could the head be advanced to the doc we break on but the advance call there actually yields a perf hit. Yet, we can play some tricks like if (DF / maxdoc X) enum.advance( n ) else while(n enum.nextDoc()); which I think I'll look into after vacation :) was (Author: simonw): bq. I think advancing the lead in program code at the place of the break comment could fix this. Paul this works and looks as expected. Once we break to the advanceHead label we step out the inner do {} while; and advance the head. Maybe I don't understand your comment correctly? There is certainly space for improvement here. for instance could the head be advanced to the doc we break on but the advance call there actually yields a perf hit. Yet, we can play some tricks like if (DF / maxdoc X) enum.advance(n) else while(n enum.nextDoc()); which I think I'll look into after vacation :) Specialize BooleanQuery if all clauses are TermQueries -- Key: LUCENE-3328 URL: https://issues.apache.org/jira/browse/LUCENE-3328 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch During work on LUCENE-3319 I ran into issues with BooleanQuery compared to PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass the position matching, essentially doing a conjunction match, ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. I converted a ConjunctionScorer to use DocsEnum directly but still didn't get all the 40% from PhraseQuery. Yet, it turned out with further optimizations this gets very close to PhraseQuery. The biggest gain here came from converting the hand crafted loop in ConjunctionScorer#doNext to a for loop which seems to be less confusing to hotspot. In this particular case I think code specialization makes lots of sense since BQ with TQ is by far one of the most common queries. I will upload a patch shortly -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries
[ https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068394#comment-13068394 ] Simon Willnauer edited comment on LUCENE-3328 at 7/20/11 3:37 PM: -- bq. here is the same thing, only as a scorer that booleanweight picks. I like the size of the patch! Thanks for moving this into the weight. I had it separate to make BW less complex but this looks good though. bq. In general i think Query.rewrite should be reserved for simplifying Queries, this is not a simpler query, just a faster scorer I disagree here, if this would be the case it should be called simplify(Query). In general its a rewrite method and should not be judged if it simplifies or not. Here are some benchmark results trunk vs. patch (10M medium wiki docs): ||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff|| |Prefix3|29.84|1.14|29.02|1.37| -10% - 5%| |IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%| |Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%| |Term |79.10| 4.32|77.43| 3.25| -11% - 7%| |OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%| |TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%| |OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% | |Respell|15.99| 0.59|15.95| 0.52|-6% - 6%| |TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% | |Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% | |TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% | |Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% | |Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% | |SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% | |PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% | |SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% | |AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% | |AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% | was (Author: simonw): bq. here is the same thing, only as a scorer that booleanweight picks. I like the size of the patch! Thanks for moving this into the weight. I had it separate to make BW less complex but this looks good though. bq. In general i think Query.rewrite should be reserved for simplifying Queries, this is not a simpler query, just a faster scorer I disagree here, if this would be the case it should be called simplify(Query). In general its a rewrite method and should not be judged if it simplifies or not. Here are some benchmark results trunk vs. patch: ||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff|| |Prefix3|29.84|1.14|29.02|1.37| -10% - 5%| |IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%| |Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%| |Term |79.10| 4.32|77.43| 3.25| -11% - 7%| |OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%| |TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%| |OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% | |Respell|15.99| 0.59|15.95| 0.52|-6% - 6%| |TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% | |Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% | |TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% | |Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% | |Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% | |SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% | |PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% | |SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% | |AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% | |AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% | Specialize BooleanQuery if all clauses are TermQueries -- Key: LUCENE-3328 URL: https://issues.apache.org/jira/browse/LUCENE-3328 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3328.patch, LUCENE-3328.patch During work on LUCENE-3319 I ran into issues with BooleanQuery compared to PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass the position matching, essentially doing a conjunction match, ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. I converted a ConjunctionScorer to use DocsEnum directly but still didn't get all the 40% from PhraseQuery. Yet, it turned out with further optimizations this gets very close to PhraseQuery. The biggest gain here came from converting the hand crafted loop in ConjunctionScorer#doNext to a for loop which seems to be less confusing to hotspot. In this particular case I think code specialization makes lots of sense since BQ with TQ is by far one of the most common queries. I will upload a patch shortly -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org