Re: Synonym and Whitespaces and optional TokenizerFactory

Ravi Solr Thu, 18 Aug 2011 13:18:32 -0700

If you have multi-word synonyms you could use -
tokenizerFactory="solr.KeywordTokenizerFactory" - in the
SynonymFilterFactory filter factory declaration. This assumes that
your tokenizer for that field allows for keeping the phrases as a
single token (achieved by using solr.KeywordTokenizerFactory instead
of Standard Tokenizer), if it is not then you might miss the synonym
setting altogether. See the configuration below



     <analyzer>
       <tokenizer class="solr.KeywordTokenizerFactory"/>
       <filter class="solr.TrimFilterFactory" />
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
       <filter class="solr.SynonymFilterFactory"
tokenizerFactory="solr.KeywordTokenizerFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false" />
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>


Then you can use synonyms like

Barack Obama,Barak Obama,Barack H. Obama,Barack Hussein Obama, Barak
Hussein Obama => Barack Obama

Ravi Kiran Bhaskar
Principal Software Engineer
Washington Post Digital
1150 15th Street NW, Washington, DC 20071


On Thu, Aug 18, 2011 at 3:21 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> How about escaping white\ space?
>
> cheers
>
>> Hmmm, why doesn't the multi word synonym syntax in your
>> synonym.txt handle this case? Or am I missing something
>> totally?
>>
>> Best
>> Erick
>>
>> On Wed, Aug 17, 2011 at 10:02 PM, Will Milspec <will.mils...@gmail.com>
> wrote:
>> > Hi all,
>> >
>> > This may be obvious. My question pertains to use of tokenizerFactory
>> > together with SynonymFilterFactory. Which tokenizerFactory does one  use
>> > to treat "synonyms with spaces" as one token,
>> >
>> > Example these two entries are synonyms: "lms", "learning management
>> > system"
>> >
>> > index time expansion would expand "lms" to these terms
>> >           "lms"
>> >           "learning management system"
>> >
>> > i.e. not  like this:
>> >           "lms"
>> >           "learning"
>> >           "management"
>> >           "system"
>> >
>> > Excerpt from the wiki article:
>> >
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> > <quote>
>> > The optional *tokenizerFactory* parameter names a tokenizer factory class
>> > to analyze synonyms (see
>> > https://issues.apache.org/jira/browse/SOLR-319), which can help with the
>> > synonym+stemming problem described in
>> > http://search-lucene.com/m/hg9ri2mDvGk1 .
>> > </quote>
>> >
>> > thanks,
>> >
>> > will
>

Re: Synonym and Whitespaces and optional TokenizerFactory

Reply via email to