Re: DutchStemFilterFactory reducing double vowels bug ?

Chris Hostetter Tue, 21 Jul 2009 16:58:32 -0700

: Some time ago I configured my Solr instance to use the
: DutchStemFilterFactory.
        ...
: Words like 'baas', 'paas', 'maan', 'boom' etc. are indexed as 'bas',
: 'pas', 'man' and 'bom'. Those wordt have a meaning of their own. Am I
: missing something, or has this to be considered as a bug?


I know nothing about Dutch, but the DutchStemFilterFactory is just a 
factory for the DutchStemFilter, which is just a Lucene TOkenFilter 
arround the DutchStemmer which is a java impl of this algorithm...

http://snowball.tartarus.org/algorithms/dutch/stemmer.html

...according to that page, Step#4 explicilty includes a 
reduction of doubled vowels (maan->man is an explicit example)

so the code seems to be working as specified .. wether it's what you 
*want* is a different question.


-Hoss

Re: DutchStemFilterFactory reducing double vowels bug ?

Reply via email to