Okay, let's try the debug trace again without a pf to be less confusing.
One field in qf, that's ordinary text tokenized, and does get hits:
q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t&mm=100%&debugQuery=true&pf=
<str name="rawquerystring">churchill : roosevelt</str>
<str name="querystring">churchill : roosevelt</str>
<str name="parsedquery">
+((DisjunctionMaxQuery((title1_t:churchil)~0.01)
DisjunctionMaxQuery((title1_t:roosevelt)~0.01))~2) ()
</str>
<str name="parsedquery_toString">
+(((title1_t:churchil)~0.01 (title1_t:roosevelt)~0.01)~2) ()
</str>
And that gets 25 hits. Now we add in a second field to the qf, this
second field is also ordinarily tokenized. We expect no _fewer_ than 25
hits, adding another field into qf, right? And indeed it still results
in exactly 25 hits (no additional hits from the additional qf field).
?q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t%20title2_t&mm=100%&debugQuery=true&pf=
<str name="parsedquery">
+((DisjunctionMaxQuery((title2_t:churchil | title1_t:churchil)~0.01)
DisjunctionMaxQuery((title2_t:roosevelt | title1_t:roosevelt)~0.01))~2) ()
</str>
<str name="parsedquery_toString">
+(((title2_t:churchil | title1_t:churchil)~0.01 (title2_t:roosevelt |
title1_t:roosevelt)~0.01)~2) ()
</str>
Okay, now we go back to just that first (ordinarily tokenized) field,
but add a second field in that uses KeywordTokenizerFactory. We expect
this not neccesarily to ever match for a multi-word query, but we don't
expect it to be fewer than 25 hits, the 25 hits from the first field in
the qf should still be there, right? But it's not. What happened, why not?
q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t%20isbn_t&mm=100%&debugQuery=true&pf=
str name="rawquerystring">churchill : roosevelt</str>
<str name="querystring">churchill : roosevelt</str>
<str name="parsedquery">+((DisjunctionMaxQuery((isbn_t:churchill |
title1_t:churchil)~0.01) DisjunctionMaxQuery((isbn_t::)~0.01)
DisjunctionMaxQuery((isbn_t:roosevelt | title1_t:roosevelt)~0.01))~3)
()</str>
<str name="parsedquery_toString">+(((isbn_t:churchill |
title1_t:churchil)~0.01 (isbn_t::)~0.01 (isbn_t:roosevelt |
title1_t:roosevelt)~0.01)~3) ()</str>
On 6/14/2011 5:19 PM, Jonathan Rochkind wrote:
I'm aware that using a field tokenized with KeywordTokenizerFactory is
in a dismax 'qf' is often going to result in 0 hits on that field --
(when a whitespace-containing query is entered). But I do it anyway,
for cases where a non-whitespace-containing query is entered, then it
hits. And in those cases where it doesn't hit, I figure okay, well,
the other fields in qf will hit or not, that's good enough.
And usually that works. But it works _differently_ when my query
contains an ampersand (or any other punctuation), result in 0 hits
when it shoudln't, and I can't figure out why.
basically,
&defType=dismax&mm=100%&q=one : two&qf=text_field
gets hits. The ":" is thrown out the text_field, but the mm still
passes somehow, right?
But, in the same index:
&defType=dismax&mm=100%&q=one : two&qf=text_field
keyword_tokenized_text_field
gets 0 hits. Somehow maybe the inclusion of the
keyword_tokenized_text_field in the qf causes dismax to calculate the
mm differently, decide there are three tokens in there and they all
must match, and the token ":" can never match because it's not in my
index it's stripped out... but somehow this isn't a problem unless I
include a keyword-tokenized field in the qf?
This is really confusing, if anyone has any idea what I'm talking
about it and can shed any light on it, much appreciated.
The conclusion I am reaching is just NEVER include anything but a more
or less ordinarily tokenized field in a dismax qf. Sadly, it was
useful for certain use cases for me.
Oh, hey, the debugging trace woudl probably be useful:
<lstname="debug">
<strname="rawquerystring">
churchill : roosevelt
</str>
<strname="querystring">
churchill : roosevelt
</str>
<strname="parsedquery">
+((DisjunctionMaxQuery((isbn_t:churchill | title1_t:churchil)~0.01)
DisjunctionMaxQuery((isbn_t::)~0.01)
DisjunctionMaxQuery((isbn_t:roosevelt | title1_t:roosevelt)~0.01))~3)
DisjunctionMaxQuery((title2_unstem:"churchill roosevelt"~3^240.0 |
text:"churchil roosevelt"~3^10.0 | title2_t:"churchil
roosevelt"~3^50.0 | author_unstem:"churchill roosevelt"~3^400.0 |
title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchil
roosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 |
author2_unstem:"churchill roosevelt"~3^240.0 |
title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchil
roosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 |
subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchil
roosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 |
text_unstem:"churchill roosevelt"~3^80.0)~0.01)
</str>
<strname="parsedquery_toString">
+(((isbn_t:churchill | title1_t:churchil)~0.01 (isbn_t::)~0.01
(isbn_t:roosevelt | title1_t:roosevelt)~0.01)~3)
(title2_unstem:"churchill roosevelt"~3^240.0 | text:"churchil
roosevelt"~3^10.0 | title2_t:"churchil roosevelt"~3^50.0 |
author_unstem:"churchill roosevelt"~3^400.0 |
title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchil
roosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 |
author2_unstem:"churchill roosevelt"~3^240.0 |
title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchil
roosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 |
subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchil
roosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 |
text_unstem:"churchill roosevelt"~3^80.0)~0.01
</str>
<lstname="explain"/>
<strname="QParser">
DisMaxQParser
</str>
<nullname="altquerystring"/>
<nullname="boostfuncs"/>
<lstname="timing">
<doublename="time">
6.0
</double>
<lstname="prepare">
<doublename="time">
3.0
</double>
<lstname="org.apache.solr.handler.component.QueryComponent">
<doublename="time">
2.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.FacetComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.MoreLikeThisComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.HighlightComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.StatsComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.SpellCheckComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.DebugComponent">
<doublename="time">
0.0
</double>
</lst>
</lst>