Re: Store matching synonyms only

Jack Krupansky Sat, 23 Jun 2012 14:29:38 -0700

There are a number of ways this can be accomplished, including as apreprocessor or a custom update processor, but you may be able to get bywith a tokenized field without term vectors combined with a "keep words"filter and an index-time synonym filter that uses "replace mode".

So, in addition to storing the text in a normal text field, do a copyFieldto a separate text field which has omitTermFreqAndPositions=true since thisfield only needs to indicate the presence of a keyword and not its positionor frequency. It would have a custome field type which starts its indexanalyzer with a "keep words" token filter (solr.KeepWordFilterFactory) witha word list file which contains all words used in your synonyms. Thiseliminates all words that do not match one of your synonym words.

Then add a synonym filter that operates in replace mode - expand=true andignoreCase=true, with entries such as:


feline,cat,lion,tiger

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

This would index "The cat sat on the  tiger's mat" as simply "feline"

-- Jack Krupansky

-----Original Message-----From: ben ausden

Sent: Saturday, June 23, 2012 1:21 PM
To: solr-user@lucene.apache.org
Subject: Store matching synonyms only

Hi,

Is it possible to store only the matching synonyms found in a piece of
text?

A use case might be: automatically "tag" documents at index time based on
synonyms.txt, and then retrieve the stored tags at query time.

For example, given the text field:

 "The cat sat on the mat"

and a synonyms.txt file containing:

feline,cat,lion,tiger

the resulting tag for this document would be "feline". Multiple synonym
matches would result in multiple tags.

Is this possible with Solr by default, or is the classification/tagging
best done outside Solr before I store the document?

Thanks.

Re: Store matching synonyms only

Reply via email to