[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

2011-07-22 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069728#comment-13069728
 ] 

Simon Willnauer edited comment on LUCENE-3328 at 7/22/11 8:38 PM:
--

bq. I think advancing the lead in program code at the place of the break 
comment could fix this.
Paul this works and looks as expected. Once we break to the advanceHead label 
we step out the inner do {} while; and advance the head. Maybe I don't 
understand your comment correctly?

There is certainly space for improvement here. for instance could the head be 
advanced to the doc we break on but the advance call there actually yields a 
perf hit. Yet, we can play some tricks like if (DF / maxdoc  X) 
enum.advance(n) else while(n  enum.nextDoc()); which I think I'll look into 
after vacation :)

  was (Author: simonw):
bq. I think advancing the lead in program code at the place of the break 
comment could fix this.
Paul this works and looks as expected. Once we break to the advanceHead label 
we step out the inner do {} while; and advance the head. Maybe I don't 
understand your comment correctly?

There is certainly space for improvement here. for instance could the head be 
advanced to the doc we break on but the advance call there actually yields a 
perf hit. Yet, we can play some tricks like if (DF / maxdoc  X) 
enum.advance(x) else while(x  enum.nextDoc()); which I think I'll look into 
after vacation :)
  
 Specialize BooleanQuery if all clauses are TermQueries
 --

 Key: LUCENE-3328
 URL: https://issues.apache.org/jira/browse/LUCENE-3328
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch


 During work on LUCENE-3319 I ran into issues with BooleanQuery compared to 
 PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass 
 the position matching, essentially doing a conjunction match, 
 ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. 
 I converted a ConjunctionScorer to use DocsEnum directly but still didn't get 
 all the 40% from PhraseQuery. Yet, it turned out with further optimizations 
 this gets very close to PhraseQuery. The biggest gain here came from 
 converting the hand crafted loop in ConjunctionScorer#doNext to a for loop 
 which seems to be less confusing to hotspot. In this particular case I think 
 code specialization makes lots of sense since BQ with TQ is by far one of the 
 most common queries.
 I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

2011-07-22 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069728#comment-13069728
 ] 

Simon Willnauer edited comment on LUCENE-3328 at 7/22/11 8:39 PM:
--

bq. I think advancing the lead in program code at the place of the break 
comment could fix this.
Paul this works and looks as expected. Once we break to the advanceHead label 
we step out the inner do {} while; and advance the head. Maybe I don't 
understand your comment correctly?

There is certainly space for improvement here. for instance could the head be 
advanced to the doc we break on but the advance call there actually yields a 
perf hit. Yet, we can play some tricks like if (DF / maxdoc  X) enum.advance( 
n ) else while(n  enum.nextDoc()); which I think I'll look into after vacation 
:)

  was (Author: simonw):
bq. I think advancing the lead in program code at the place of the break 
comment could fix this.
Paul this works and looks as expected. Once we break to the advanceHead label 
we step out the inner do {} while; and advance the head. Maybe I don't 
understand your comment correctly?

There is certainly space for improvement here. for instance could the head be 
advanced to the doc we break on but the advance call there actually yields a 
perf hit. Yet, we can play some tricks like if (DF / maxdoc  X) 
enum.advance(n) else while(n  enum.nextDoc()); which I think I'll look into 
after vacation :)
  
 Specialize BooleanQuery if all clauses are TermQueries
 --

 Key: LUCENE-3328
 URL: https://issues.apache.org/jira/browse/LUCENE-3328
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch


 During work on LUCENE-3319 I ran into issues with BooleanQuery compared to 
 PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass 
 the position matching, essentially doing a conjunction match, 
 ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. 
 I converted a ConjunctionScorer to use DocsEnum directly but still didn't get 
 all the 40% from PhraseQuery. Yet, it turned out with further optimizations 
 this gets very close to PhraseQuery. The biggest gain here came from 
 converting the hand crafted loop in ConjunctionScorer#doNext to a for loop 
 which seems to be less confusing to hotspot. In this particular case I think 
 code specialization makes lots of sense since BQ with TQ is by far one of the 
 most common queries.
 I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

2011-07-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068394#comment-13068394
 ] 

Simon Willnauer edited comment on LUCENE-3328 at 7/20/11 3:37 PM:
--

bq. here is the same thing, only as a scorer that booleanweight picks.

I like the size of the patch! Thanks for moving this into the weight. I had it 
separate to make BW less complex but this looks good though.


bq. In general i think Query.rewrite should be reserved for simplifying 
Queries, this is not a simpler query, just a faster scorer 

I disagree here, if this would be the case it should be called simplify(Query). 
In general its a rewrite method and should not be judged if it simplifies or 
not. 



Here are some benchmark results trunk vs. patch (10M medium wiki docs):


||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 |  0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% -  7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M  |  10.82| 0.76|10.77| 0.69 |-13% -   13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% -  6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% -  8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |

  was (Author: simonw):
bq. here is the same thing, only as a scorer that booleanweight picks.

I like the size of the patch! Thanks for moving this into the weight. I had it 
separate to make BW less complex but this looks good though.


bq. In general i think Query.rewrite should be reserved for simplifying 
Queries, this is not a simpler query, just a faster scorer 

I disagree here, if this would be the case it should be called simplify(Query). 
In general its a rewrite method and should not be judged if it simplifies or 
not. 



Here are some benchmark results trunk vs. patch:


||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 |  0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% -  7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M  |  10.82| 0.76|10.77| 0.69 |-13% -   13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% -  6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% -  8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |
  
 Specialize BooleanQuery if all clauses are TermQueries
 --

 Key: LUCENE-3328
 URL: https://issues.apache.org/jira/browse/LUCENE-3328
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3328.patch, LUCENE-3328.patch


 During work on LUCENE-3319 I ran into issues with BooleanQuery compared to 
 PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass 
 the position matching, essentially doing a conjunction match, 
 ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. 
 I converted a ConjunctionScorer to use DocsEnum directly but still didn't get 
 all the 40% from PhraseQuery. Yet, it turned out with further optimizations 
 this gets very close to PhraseQuery. The biggest gain here came from 
 converting the hand crafted loop in ConjunctionScorer#doNext to a for loop 
 which seems to be less confusing to hotspot. In this particular case I think 
 code specialization makes lots of sense since BQ with TQ is by far one of the 
 most common queries.
 I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org