On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:

> But somehow this feels bad (well, so does sticking word variations in what's
> supposed to be a synonyms file), partly because it means that the person 
> adding
> new synonyms would need to know what they stem to (or always check it against
> Solr before editing the file).

when creating the synonym map from your input file, currently the
factory actually uses your Tokenizer only to pre-process the synonyms
file.

One idea would be to use the tokenstream up to the synonymfilter
itself (including filters). This way if you put a stemmer before the
synonymfilter, it would stem your synonyms file, too.

I haven't totally thought the whole thing through to see if theres a
big reason why this wouldn't work (the synonymsfilter is complicated,
sorry). But it does seem like it would produce more consistent
results... and perhaps the inconsistency isnt so obvious since in the
default configuration the synonymfilter is directly after the
tokenizer.

Reply via email to