: Some time ago I configured my Solr instance to use the : DutchStemFilterFactory. ... : Words like 'baas', 'paas', 'maan', 'boom' etc. are indexed as 'bas', : 'pas', 'man' and 'bom'. Those wordt have a meaning of their own. Am I : missing something, or has this to be considered as a bug?
I know nothing about Dutch, but the DutchStemFilterFactory is just a factory for the DutchStemFilter, which is just a Lucene TOkenFilter arround the DutchStemmer which is a java impl of this algorithm... http://snowball.tartarus.org/algorithms/dutch/stemmer.html ...according to that page, Step#4 explicilty includes a reduction of doubled vowels (maan->man is an explicit example) so the code seems to be working as specified .. wether it's what you *want* is a different question. -Hoss