Problems with StemFilter and Wildcards

Doris Peter Thu, 18 Jul 2019 01:48:21 -0700

Hi, we have got some problems with the stemming of our ocr-texts:

We use the following configuration for our full-text-ocr field:


 <fieldtype name="text_ocr" class="solr.TextField" termPositions="true" 
termVectors="true" termPayloads="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.GermanStemFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-FoldToASCII.txt"/>
        <filter class="solr.DelimitedPayloadTokenFilterFactory" delimiter="⚑"
          encoder="org.mdz.search.solrocr.lucene.byteoffset.ByteOffsetEncoder" 
/>
        <filter class="solr.WordDelimiterGraphFilterFactory" 
protected="protectedword.txt"
             preserveOriginal="0" splitOnNumerics="1" splitOnCaseChange="0"
             catenateWords="1" catenateNumbers="1" catenateAll="1"
             generateWordParts="1" generateNumberParts="1" 
stemEnglishPossessive="1"
             types="wdfftypes.txt" />
      </analyzer>
    </fieldtype>


Now it seems, the StemFilter and wildcard queries don't work together.
When I search for 

Weltkriegs I get 6 documents.

But when I search for 

Weltkrie?s I get only 1 document.

For

wel?kriegs as well, only 1 document.


It happens only with terms which are changed by the stemming filter. Is there a 
way to fix this?


Thanks a lot, Doris

Problems with StemFilter and Wildcards

Reply via email to