phrase query and string/keyword tokenizer

Cat Bieber Thu, 14 Jun 2012 09:43:06 -0700

I have documents that are word definitions (basically an onlinedictionary) that can have alternate titles. For example the documententitled "Read-only memory" might have an alternate title of "ROM". Insearch results, I want to boost documents with an alternate title thatis a case-insensitive "exact match" for the query text -- e.g. "rom"should work as well.


I'm running solr 3.6 and using edismax.

I've gone through a few iterations of this. What I have working best sofar is a multi-valued text field for the alternate titles with a big boost:

<fieldType name="lowerCaseSort" class="solr.TextField"sortMissingLast="true" omitNorms="true">

<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>

<field name="bestMatchTitle" type="lowerCaseSort" indexed="true"stored="false" multiValued="true"/>

This produces great results with single-word searches like the "ROM"example above. It runs into problems with a multi-word alternate titlelike "Blue Tooth". I have read some of the prior discussions about this,regarding how the query is parsed based on spaces before it gets to thekeyword tokenizer for the field type.

The question I have is about phrase queries in this case. My requesthandler has:

<str name="qf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5body^1 author^0.5</str><str name="pf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5body^1 author^0.5</str>


When I run a query, I get this:

It looks like the phrase isn't being matched against my bestMatchTitlefield. It also isn't matched against author, which is type string. So dophrases only get matched against certain field types?


When I put the quotes in the query text:

/select/?qt=best-match&q="blue+tooth"&debugQuery=on

It builds the query I was hoping to get:

But I still need the query on the individual tokens, otherwise iteliminates results that may be good hits. So far, any way I have triedto combine the two queries either opens up matching a ton of documentsthat shouldn't really match (e.g. total found goes from 24 to 4800+documents) or doesn't match the one I want, giving poor results.

Does anyone have suggestions for how I can convince the phrase query tomatch against my bestMatchTitle field, or change the query text I'mpassing in to combine these two queries and get the boost I want? Or isthere another approach altogether that I'm missing?


Thanks for any help with this.
-Cat Bieber

phrase query and string/keyword tokenizer

Reply via email to