Re: Advice on Exact Matching?

Jonathan Rochkind Tue, 04 Jan 2011 13:51:06 -0800

There is a hacky kind of thing that Bill Dueber figured out for usingmultiple fields and dismax to BOOST "exact" matches, but include allmatches in the result set.

You have to duplicate your data in a second non-tokenized field. Thenyou use dismax pf to super boost matches on the non-tokenized field.Because 'pf' is a phrase search, you don't run into trouble with dismax"pre-tokenization" on white space, even though it's a field that mighthave internal-token whitespace. (Using a non-tokenized field with dismaxqf will basically never match a result with whitespace, unless it'sphrase-quoted in query. But pf works.).

Because it was a non-tokenized field, it only matches (and triggers thedismax ps super boost) if it's an exact match. And it works. You CANnormalize your 'exact match' field in field analysis, removingpunctuation or normalizing whitespace or whatever, and that works too,doing it both at index and query time analysis.




On 1/4/2011 4:28 PM, Chris Hostetter wrote:

: I am trying to make sure that when I search for text—regardless of
: what that text is—that I get an exact match.  I'm *still* getting some
: issues, and this last mile is becoming very painful.  The solr field,
: for which I'm setting this up on, is pasted below my explanation.  I
: appreciate any help.

if you are using a TextField with some analysis components, it's
virtually impossible to get "exact" matches -- where my definition of
exact is that the query text is character for character identical to the
entire field value indexed.

is your definition of exact match different?  i assme it must be since you
are using TextField and talk about wanting to deal with whitespace between
words.  so i think you need to explain a little bit better what your
indexed data looks like, and what sample queries you expect to match that
data (and equally important: what queries should *not* match thta data,
and what data should *not* match those queries)

: If I want to find *all* Solr documents that match
: "[id]somejunk\hi[/id]" then life is instantly hell.

90% of the time when people have problems with "exact" matches it's
because of QueryParser meta characters -- characters like ":", "[" and
whitespace that the QUeryParser uses as instructions.  you can use the
"raw" QParser to have every character treated as a literal....

        defType=raw
        q=[id]somejunk\hi[/id]

-Hoss

Re: Advice on Exact Matching?

Reply via email to