Keep in mind that if you use a field type that includes spaces (eg StrField, or KeywordTokenizer), then if you're using dismax or lucene query parsers, the only way to find matches in this field on queries that include spaces will be to do explicit phrase searches with double quotes.

These fields will, however, work fine with "pf" in dismax/edismax as per Hoss's example.

But yeah, I do what Hoss recommends -- I've got a KeywordTokenizer copy of my searchable field. I use a pf on that field with a very high boost to try and boost truly "complete" matches, that match the entirety of the value. It's not exactly 'exact', I still do some normalization, including flattening unicode to ascii, and normalizing 1 or more string-or-punctuation to exactly 1 one space using a char regex filter.

It seems to pretty much work -- this is just one of various relevancy tweaks I've got going on, to the extent that my relevancy has become pretty complicated and hard to predict and doesn't always do what I'd expect/intend, but this particular aspect seems to mostly pretty much work.

On 7/27/2011 10:55 PM, Chris Hostetter wrote:
: With your solution, RECORD 1 does appear at the top but I think thats just
: blind luck more than anything else because RECORD 3 shows as having the same
: score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
: like all three records returned with RECORD 1 being the first listing.

with omitNorms RECORD1 and RECORD3 have the same score because only the
tf() matters, and both docs contain the term "frank" exactly twice.

the reason RECORD1 isn't scoring higher even though it contains (as you
put it "matchings 'Fred' exactly" is that from a term perspective, RECORD1
doesn't actually match "myname:Fred" exactly, because there are in fact
other terms in that field because it's multivalued.

one way to indicate that you (only* want documents where entire field
values to match your input (ie: RECORD1 but no other records) would be to
use a StrField instead of a TextField or an analyzer that doesn't split up
tokens (lie: something using KeywordTokenizer).  that way a query on
myname:Frank would not match a document where you had indexed the value
"Frank Stalone" by a query for myname:"Frank Stalone" would.

in your case, you don't want *only* the exact field value matches, but you
want them boosted, so you could do something like copyField "myname" into
"myname_str" and then do...

   q=+myname:Frank myname_str:"Frank"^100

...in which case a match on "myname" is required, but a match on
"myname_str" will greatly increase the score.

dismax (and edismax) are really designed for situations like this...

   defType=dismax&  qf=myname&  pf=myname_str^100&  q=Frank



-Hoss

Reply via email to