Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

AHMET ARSLAN Sun, 27 Sep 2009 08:43:27 -0700

> Thanks, this helps. 
> But our synonym file has some 16,000 sets of synonyms.


Thats a lot. Can you give some examples?


> - the individual synonyms in your synonym file should be in
> a form as if they were sent through the tokenizers which
> come before the SynonymFilterFactory.

Exactly. Orders of filters are very important. 
(choice of Tokenizer and CharFilter also)
For example if you have StemFilter before SynonymFilter then your syn.txt 
should contain stemmed synonyms. 

IMO 
Absalom\, Absalom!, William Faulkner    is an ugly entry.
absalom absalom, william faulkner       is a beautiful entry.


> Would it be possible for Solr to apply the Tokenizer in use
> while reading the synonym file? Then the user would only
> need the original synonym file, and their could not be a
> conflict.

Purpose of tokenizer is to break free form text into words (tokens). 

May be you can try to use solr.MappingCharFilterFactory with the mapping.txt

"absalom absalom" => "william faulkner"

But i am not sure if it is a good idea. 
Also I am not sure about its case sensitivity.

Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

Reply via email to