Keep in mind that if you use a field type that includes spaces (eg
StrField, or KeywordTokenizer), then if you're using dismax or lucene
query parsers, the only way to find matches in this field on queries
that include spaces will be to do explicit phrase searches with double
quotes.
These fields will, however, work fine with "pf" in dismax/edismax as per
Hoss's example.
But yeah, I do what Hoss recommends -- I've got a KeywordTokenizer copy
of my searchable field. I use a pf on that field with a very high boost
to try and boost truly "complete" matches, that match the entirety of
the value. It's not exactly 'exact', I still do some normalization,
including flattening unicode to ascii, and normalizing 1 or more
string-or-punctuation to exactly 1 one space using a char regex filter.
It seems to pretty much work -- this is just one of various relevancy
tweaks I've got going on, to the extent that my relevancy has become
pretty complicated and hard to predict and doesn't always do what I'd
expect/intend, but this particular aspect seems to mostly pretty much work.
On 7/27/2011 10:55 PM, Chris Hostetter wrote:
: With your solution, RECORD 1 does appear at the top but I think thats just
: blind luck more than anything else because RECORD 3 shows as having the same
: score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
: like all three records returned with RECORD 1 being the first listing.
with omitNorms RECORD1 and RECORD3 have the same score because only the
tf() matters, and both docs contain the term "frank" exactly twice.
the reason RECORD1 isn't scoring higher even though it contains (as you
put it "matchings 'Fred' exactly" is that from a term perspective, RECORD1
doesn't actually match "myname:Fred" exactly, because there are in fact
other terms in that field because it's multivalued.
one way to indicate that you (only* want documents where entire field
values to match your input (ie: RECORD1 but no other records) would be to
use a StrField instead of a TextField or an analyzer that doesn't split up
tokens (lie: something using KeywordTokenizer). that way a query on
myname:Frank would not match a document where you had indexed the value
"Frank Stalone" by a query for myname:"Frank Stalone" would.
in your case, you don't want *only* the exact field value matches, but you
want them boosted, so you could do something like copyField "myname" into
"myname_str" and then do...
q=+myname:Frank myname_str:"Frank"^100
...in which case a match on "myname" is required, but a match on
"myname_str" will greatly increase the score.
dismax (and edismax) are really designed for situations like this...
defType=dismax& qf=myname& pf=myname_str^100& q=Frank
-Hoss