Hello,
I'm trying to index the following token with payload "winter tires|1.4" as
an exact match but also I want to apply hunspell lemmer to this token and
keep both the original and the lemma. So after all that I want to have the
following tokens:
"winter tires" with payload 1.4
"winter tire" with payload 1.4
I thought of doing it this way:
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory"
pattern="([a-zA-Z0-9]+|\s+|\|\d+\.\d+)" group="1"/>
<filter class="solr.KeywordRepeatFilterFactory" />
<filter class="solr.HunspellStemFilterFactory" dictionary="dic"
affix="aff" ignoreCase="true" strictAffixParsing="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ConcatenateGraphFilterFactory" />
<filter class="solr.DelimitedPayloadTokenFilterFactory"
encoder="float"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="10" outputUnigrams="true" outputUnigramsIfNoShingles="true"
fillerToken=“” />
</analyzer>
But what happens here is that the indexed tokens are "winter tires|1.4" and
"winter tire|1.4" because any filter
after solr.ConcatenateGraphFilterFactory does not apply.
Do you have any idea how I can concatenate the tokens from a stream without
using solr.ConcatenateGraphFilterFactory? Or how I can achieve the above?
Thanks.