Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

Jonathan Rochkind Tue, 14 Jun 2011 14:27:13 -0700

Okay, let's try the debug trace again without a pf to be less confusing.


One field in qf, that's ordinary text tokenized, and does get hits:

q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t&mm=100%&debugQuery=true&pf=

<str name="rawquerystring">churchill : roosevelt</str>
<str name="querystring">churchill : roosevelt</str>
<str name="parsedquery">

+((DisjunctionMaxQuery((title1_t:churchil)~0.01)DisjunctionMaxQuery((title1_t:roosevelt)~0.01))~2) ()

</str>
<str name="parsedquery_toString">
+(((title1_t:churchil)~0.01 (title1_t:roosevelt)~0.01)~2) ()
</str>

And that gets 25 hits. Now we add in a second field to the qf, thissecond field is also ordinarily tokenized. We expect no _fewer_ than 25hits, adding another field into qf, right? And indeed it still resultsin exactly 25 hits (no additional hits from the additional qf field).


?q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t%20title2_t&mm=100%&debugQuery=true&pf=

<str name="parsedquery">

+((DisjunctionMaxQuery((title2_t:churchil | title1_t:churchil)~0.01)DisjunctionMaxQuery((title2_t:roosevelt | title1_t:roosevelt)~0.01))~2) ()

</str>
<str name="parsedquery_toString">

+(((title2_t:churchil | title1_t:churchil)~0.01 (title2_t:roosevelt |title1_t:roosevelt)~0.01)~2) ()

</str>

Okay, now we go back to just that first (ordinarily tokenized) field,but add a second field in that uses KeywordTokenizerFactory. We expectthis not neccesarily to ever match for a multi-word query, but we don'texpect it to be fewer than 25 hits, the 25 hits from the first field inthe qf should still be there, right? But it's not. What happened, why not?


q=churchill%20%3A%20roosevelt&qt=search&qf=title1_t%20isbn_t&mm=100%&debugQuery=true&pf=


str name="rawquerystring">churchill : roosevelt</str>
<str name="querystring">churchill : roosevelt</str>

<str name="parsedquery">+((DisjunctionMaxQuery((isbn_t:churchill |title1_t:churchil)~0.01) DisjunctionMaxQuery((isbn_t::)~0.01)DisjunctionMaxQuery((isbn_t:roosevelt | title1_t:roosevelt)~0.01))~3)()</str><str name="parsedquery_toString">+(((isbn_t:churchill |title1_t:churchil)~0.01 (isbn_t::)~0.01 (isbn_t:roosevelt |title1_t:roosevelt)~0.01)~3) ()</str>




On 6/14/2011 5:19 PM, Jonathan Rochkind wrote:

I'm aware that using a field tokenized with KeywordTokenizerFactory isin a dismax 'qf' is often going to result in 0 hits on that field --(when a whitespace-containing query is entered). But I do it anyway,for cases where a non-whitespace-containing query is entered, then ithits. And in those cases where it doesn't hit, I figure okay, well,the other fields in qf will hit or not, that's good enough.
And usually that works. But it works _differently_ when my querycontains an ampersand (or any other punctuation), result in 0 hitswhen it shoudln't, and I can't figure out why.
basically,

&defType=dismax&mm=100%&q=one : two&qf=text_field
gets hits. The ":" is thrown out the text_field, but the mm stillpasses somehow, right?
But, in the same index:
&defType=dismax&mm=100%&q=one : two&qf=text_fieldkeyword_tokenized_text_field
gets 0 hits. Somehow maybe the inclusion of thekeyword_tokenized_text_field in the qf causes dismax to calculate themm differently, decide there are three tokens in there and they allmust match, and the token ":" can never match because it's not in myindex it's stripped out... but somehow this isn't a problem unless Iinclude a keyword-tokenized field in the qf?
This is really confusing, if anyone has any idea what I'm talkingabout it and can shed any light on it, much appreciated.
The conclusion I am reaching is just NEVER include anything but a moreor less ordinarily tokenized field in a dismax qf. Sadly, it wasuseful for certain use cases for me.
Oh, hey, the debugging trace woudl probably be useful:


<lstname="debug">
<strname="rawquerystring">
churchill : roosevelt
</str>
<strname="querystring">
churchill : roosevelt
</str>
<strname="parsedquery">
+((DisjunctionMaxQuery((isbn_t:churchill | title1_t:churchil)~0.01)DisjunctionMaxQuery((isbn_t::)~0.01)DisjunctionMaxQuery((isbn_t:roosevelt | title1_t:roosevelt)~0.01))~3)DisjunctionMaxQuery((title2_unstem:"churchill roosevelt"~3^240.0 |text:"churchil roosevelt"~3^10.0 | title2_t:"churchilroosevelt"~3^50.0 | author_unstem:"churchill roosevelt"~3^400.0 |title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchilroosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 |author2_unstem:"churchill roosevelt"~3^240.0 |title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchilroosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 |subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchilroosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 |text_unstem:"churchill roosevelt"~3^80.0)~0.01)
</str>
<strname="parsedquery_toString">
+(((isbn_t:churchill | title1_t:churchil)~0.01 (isbn_t::)~0.01(isbn_t:roosevelt | title1_t:roosevelt)~0.01)~3)(title2_unstem:"churchill roosevelt"~3^240.0 | text:"churchilroosevelt"~3^10.0 | title2_t:"churchil roosevelt"~3^50.0 |author_unstem:"churchill roosevelt"~3^400.0 |title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchilroosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 |author2_unstem:"churchill roosevelt"~3^240.0 |title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchilroosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 |subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchilroosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 |text_unstem:"churchill roosevelt"~3^80.0)~0.01
</str>
<lstname="explain"/>
<strname="QParser">
DisMaxQParser
</str>
<nullname="altquerystring"/>
<nullname="boostfuncs"/>
<lstname="timing">
<doublename="time">
6.0
</double>
<lstname="prepare">
<doublename="time">
3.0
</double>
<lstname="org.apache.solr.handler.component.QueryComponent">
<doublename="time">
2.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.FacetComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.MoreLikeThisComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.HighlightComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.StatsComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.SpellCheckComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.DebugComponent">
<doublename="time">
0.0
</double>
</lst>
</lst>

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

Reply via email to