Hi, we have got some problems with the stemming of our ocr-texts: We use the following configuration for our full-text-ocr field:
<fieldtype name="text_ocr" class="solr.TextField" termPositions="true" termVectors="true" termPayloads="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.GermanStemFilterFactory"/> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/> <filter class="solr.DelimitedPayloadTokenFilterFactory" delimiter="⚑" encoder="org.mdz.search.solrocr.lucene.byteoffset.ByteOffsetEncoder" /> <filter class="solr.WordDelimiterGraphFilterFactory" protected="protectedword.txt" preserveOriginal="0" splitOnNumerics="1" splitOnCaseChange="0" catenateWords="1" catenateNumbers="1" catenateAll="1" generateWordParts="1" generateNumberParts="1" stemEnglishPossessive="1" types="wdfftypes.txt" /> </analyzer> </fieldtype> Now it seems, the StemFilter and wildcard queries don't work together. When I search for Weltkriegs I get 6 documents. But when I search for Weltkrie?s I get only 1 document. For wel?kriegs as well, only 1 document. It happens only with terms which are changed by the stemming filter. Is there a way to fix this? Thanks a lot, Doris