I have documents that are word definitions (basically an online
dictionary) that can have alternate titles. For example the document
entitled "Read-only memory" might have an alternate title of "ROM". In
search results, I want to boost documents with an alternate title that
is a case-insensitive "exact match" for the query text -- e.g. "rom"
should work as well.
I'm running solr 3.6 and using edismax.
I've gone through a few iterations of this. What I have working best so
far is a multi-valued text field for the alternate titles with a big boost:
<fieldType name="lowerCaseSort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
<field name="bestMatchTitle" type="lowerCaseSort" indexed="true"
stored="false" multiValued="true"/>
This produces great results with single-word searches like the "ROM"
example above. It runs into problems with a multi-word alternate title
like "Blue Tooth". I have read some of the prior discussions about this,
regarding how the query is parsed based on spaces before it gets to the
keyword tokenizer for the field type.
The question I have is about phrase queries in this case. My request
handler has:
<str name="qf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5
body^1 author^0.5</str>
<str name="pf">bestMatchTitle^20 title^5 summary^3 metaDescription^1.5
body^1 author^0.5</str>
When I run a query, I get this:
+((DisjunctionMaxQuery((metaDescription:blue^1.5 | summary:blue^3.0 |
author:blue^0.5 | body:blue | title:blue^5.0 |
bestMatchTitle:blue^20.0)~0.01)
DisjunctionMaxQuery((metaDescription:tooth^1.5 | summary:tooth^3.0 |
author:tooth^0.5 | body:tooth | title:tooth^5.0 |
bestMatchTitle:tooth^20.0)~0.01))~2)
DisjunctionMaxQuery((metaDescription:"blue tooth"~100^1.5 |
summary:"blue tooth"~100^3.0 | body:"blue tooth"~100 | title:"blue
tooth"~100^5.0)~0.01)
It looks like the phrase isn't being matched against my bestMatchTitle
field. It also isn't matched against author, which is type string. So do
phrases only get matched against certain field types?
When I put the quotes in the query text:
/select/?qt=best-match&q="blue+tooth"&debugQuery=on
It builds the query I was hoping to get:
+DisjunctionMaxQuery((metaDescription:"blue tooth"^1.5 | summary:"blue
tooth"^3.0 | author:blue tooth^0.5 | body:"blue tooth" | title:"blue
tooth"^5.0 | bestMatchTitle:blue tooth^20.0)~0.01)
But I still need the query on the individual tokens, otherwise it
eliminates results that may be good hits. So far, any way I have tried
to combine the two queries either opens up matching a ton of documents
that shouldn't really match (e.g. total found goes from 24 to 4800+
documents) or doesn't match the one I want, giving poor results.
Does anyone have suggestions for how I can convince the phrase query to
match against my bestMatchTitle field, or change the query text I'm
passing in to combine these two queries and get the boost I want? Or is
there another approach altogether that I'm missing?
Thanks for any help with this.
-Cat Bieber