Index analyzer concatenate tokens

Florin Babes Fri, 29 Jan 2021 03:27:40 -0800

Hello,
I'm trying to index the following token with payload "winter tires|1.4" as
an exact match but also I want to apply hunspell lemmer to this token and
keep both the original and the lemma. So after all that I want to have the
following tokens:
"winter tires" with payload 1.4
"winter tire" with payload 1.4


I thought of doing it this way:
<analyzer type="index">
             <tokenizer class="solr.PatternTokenizerFactory"
pattern="([a-zA-Z0-9]+|\s+|\|\d+\.\d+)" group="1"/>
            <filter class="solr.KeywordRepeatFilterFactory" />
            <filter class="solr.HunspellStemFilterFactory" dictionary="dic"
affix="aff" ignoreCase="true" strictAffixParsing="true" />
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ConcatenateGraphFilterFactory" />
            <filter class="solr.DelimitedPayloadTokenFilterFactory"
encoder="float"/>
        </analyzer>
<analyzer type="query">
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="10" outputUnigrams="true" outputUnigramsIfNoShingles="true"
fillerToken=“” />
        </analyzer>

But what happens here is that the indexed tokens are "winter tires|1.4" and
"winter tire|1.4" because any filter
after solr.ConcatenateGraphFilterFactory does not apply.

Do you have any idea how I can concatenate the tokens from a stream without
using solr.ConcatenateGraphFilterFactory? Or how I can achieve the above?

Thanks.

Index analyzer concatenate tokens

Reply via email to