I have documents that are word definitions (basically an online dictionary) that can have alternate titles. For example the document entitled "Read-only memory" might have an alternate title of "ROM". In search results, I want to boost documents with an alternate title that is a case-insensitive "exact match" for the query text -- e.g. "rom" should work as well.

I'm running solr 3.6 and using edismax.

I've gone through a few iterations of this. What I have working best so far is a multi-valued text field for the alternate titles with a big boost:

<fieldType name="lowerCaseSort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>

<field name="bestMatchTitle" type="lowerCaseSort" indexed="true" stored="false" multiValued="true"/>

This produces great results with single-word searches like the "ROM" example above. It runs into problems with a multi-word alternate title like "Blue Tooth". I have read some of the prior discussions about this, regarding how the query is parsed based on spaces before it gets to the keyword tokenizer for the field type.

The question I have is about phrase queries in this case. My request handler has:

<str name="qf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5 body^1 author^0.5</str> <str name="pf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5 body^1 author^0.5</str>

When I run a query, I get this:

+((DisjunctionMaxQuery((metaDescription:blue^1.5 | summary:blue^3.0 | author:blue^0.5 | body:blue | title:blue^5.0 | bestMatchTitle:blue^20.0)~0.01) DisjunctionMaxQuery((metaDescription:tooth^1.5 | summary:tooth^3.0 | author:tooth^0.5 | body:tooth | title:tooth^5.0 | bestMatchTitle:tooth^20.0)~0.01))~2) DisjunctionMaxQuery((metaDescription:"blue tooth"~100^1.5 | summary:"blue tooth"~100^3.0 | body:"blue tooth"~100 | title:"blue tooth"~100^5.0)~0.01)

It looks like the phrase isn't being matched against my bestMatchTitle field. It also isn't matched against author, which is type string. So do phrases only get matched against certain field types?

When I put the quotes in the query text:

/select/?qt=best-match&q="blue+tooth"&debugQuery=on

It builds the query I was hoping to get:

+DisjunctionMaxQuery((metaDescription:"blue tooth"^1.5 | summary:"blue tooth"^3.0 | author:blue tooth^0.5 | body:"blue tooth" | title:"blue tooth"^5.0 | bestMatchTitle:blue tooth^20.0)~0.01)

But I still need the query on the individual tokens, otherwise it eliminates results that may be good hits. So far, any way I have tried to combine the two queries either opens up matching a ton of documents that shouldn't really match (e.g. total found goes from 24 to 4800+ documents) or doesn't match the one I want, giving poor results.

Does anyone have suggestions for how I can convince the phrase query to match against my bestMatchTitle field, or change the query text I'm passing in to combine these two queries and get the boost I want? Or is there another approach altogether that I'm missing?

Thanks for any help with this.
-Cat Bieber

Reply via email to