This is tricky. You could try doing something with the ShingleFilter
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory)
at _query time_ to turn the users query:
"i have a swollen foot" into:
"i", "i have", "i have a", "i have a swollen", .... "have", "have a",
"have a swollen"... etc.
I _think_ you can get the ShingleFilter factory to do that.
But now you only want to exclude if one of those shingles matches the
ENTIRE "anti-word". So maybe index as non-tokenized, so each of those
shingles will somehow only match on the complete thing. You'd want to
normalize spacing and punctuation.
But then you need to turn that into a _negated_ element of your query.
Perhaps by using an fq with a NOT/"-" in it? And a query which 'matches'
(causing 'not' behavior) if _any_ of the shingles match.
I have no idea if it's actually possible to put these things together in
that way. A non-tokenized field? Which still has it's queries
shingle-ized at query time? And then works as a negated query, matching
for negation if any of the shingles match? Not really sure how to put
that together in your solrconfig.xml and/or application logic if needed.
You could try.
Another option would be doing the query-time 'shingling' in your app,
and then it's a somewhat more normal Solr query. &fq= -"shingle one"
-"shingle two" -"shingle three" etc. Or put em in seperate fq's
depending on how you want to use your filter cache. Still searching on a
non-tokenized field, and still normalizing on white-space and
punctuation at both index time and (using same normalization logic but
in your application logic this time) query time. I think that might work.
So I'm not really sure, but maybe that gives you some ideas.
Jonathan
Satish Kumar wrote:
Hi,
We have a requirement to NOT display search results if user query contains
terms that are in our anti-words field. For example, if user query is "I
have swollen foot" and if some records in our index have "swollen foot" in
anti-words field, we don't want to display those records. How do I go about
implementing this?
NOTE 1: anti-words field can contain multiple values. Each value can be a
one or multiple words (e.g. "swollen foot", "headache", etc. )
NOTE 2: the match must be exact. If anti-words field contains "swollen foot"
and if user query is "I have swollen foot", record must be excluded. If user
query is "My foot is swollen", the record should not be excluded.
Any pointers is greatly appreciated!
Thanks,
Satish