@Erick: thank you for clarifying! @Markus: I feel like I'm not (or at least should not be :-)) the first person to run into these challenges.
"You can solve this by adding manual rules to StemmerOverrideFilter, but due to the compound nature of words, you would need to add it for all the mills" After Googling I found this: https://stackoverflow.com/questions/22451774/word-does-not-get-analysed-properly-using-stemmeroverridefilterfactory-and-snowb and added http://snowball.tartarus.org/algorithms/kraaij_pohlmann/diffs.txt as stemdict_nl.txt My new fieldType definition now is: <fieldType name="searchtext_nl" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_nl.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords_nl.txt"></filter> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_nl.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords_nl.txt"></filter> </analyzer> </fieldType> I trimmed stemdict_nl.txt for testing to just this: aachen aach aachener aachener But on full-import it throws a http 500 error: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilterFactory.inform(StemmerOverrideFilterFactory.java:66) Is my stemdict_nl.txt format incorrect? And do you have examples of the HyphenationCompoundWordTokenFilter or AccentFoldingFilter I can't find any. I use Solr 4.3.1 btw, not sure if that matters. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html