Re: Exact match not the first result returned

Jonathan Rochkind Thu, 28 Jul 2011 08:07:22 -0700

Keep in mind that if you use a field type that includes spaces (egStrField, or KeywordTokenizer), then if you're using dismax or lucenequery parsers, the only way to find matches in this field on queriesthat include spaces will be to do explicit phrase searches with doublequotes.

These fields will, however, work fine with "pf" in dismax/edismax as perHoss's example.

But yeah, I do what Hoss recommends -- I've got a KeywordTokenizer copyof my searchable field. I use a pf on that field with a very high boostto try and boost truly "complete" matches, that match the entirety ofthe value. It's not exactly 'exact', I still do some normalization,including flattening unicode to ascii, and normalizing 1 or morestring-or-punctuation to exactly 1 one space using a char regex filter.

It seems to pretty much work -- this is just one of various relevancytweaks I've got going on, to the extent that my relevancy has becomepretty complicated and hard to predict and doesn't always do what I'dexpect/intend, but this particular aspect seems to mostly pretty much work.


On 7/27/2011 10:55 PM, Chris Hostetter wrote:

: With your solution, RECORD 1 does appear at the top but I think thats just
: blind luck more than anything else because RECORD 3 shows as having the same
: score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
: like all three records returned with RECORD 1 being the first listing.

with omitNorms RECORD1 and RECORD3 have the same score because only the
tf() matters, and both docs contain the term "frank" exactly twice.

the reason RECORD1 isn't scoring higher even though it contains (as you
put it "matchings 'Fred' exactly" is that from a term perspective, RECORD1
doesn't actually match "myname:Fred" exactly, because there are in fact
other terms in that field because it's multivalued.

one way to indicate that you (only* want documents where entire field
values to match your input (ie: RECORD1 but no other records) would be to
use a StrField instead of a TextField or an analyzer that doesn't split up
tokens (lie: something using KeywordTokenizer).  that way a query on
myname:Frank would not match a document where you had indexed the value
"Frank Stalone" by a query for myname:"Frank Stalone" would.

in your case, you don't want *only* the exact field value matches, but you
want them boosted, so you could do something like copyField "myname" into
"myname_str" and then do...

   q=+myname:Frank myname_str:"Frank"^100

...in which case a match on "myname" is required, but a match on
"myname_str" will greatly increase the score.

dismax (and edismax) are really designed for situations like this...

   defType=dismax&  qf=myname&  pf=myname_str^100&  q=Frank



-Hoss

Re: Exact match not the first result returned

Reply via email to