The Solr wiki says   "A repeated question is "how can I have the
original term contribute
more to the score than the stemmed version"? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. "

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

(Full section reproduced below.)
I can see how in the example from the wiki reproduced below that both
the stemmed and original term get indexed, but I don't see how the
original term gets more weight than the stemmed term.  Wouldn't this
require a filter that gives terms with the keyword attribute more
weight?

What am I missing?

Tom



---------------------------------------------
"A repeated question is "how can I have the original term contribute
more to the score than the stemmed version"? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. This filter emits two tokens for each input token, one
of them is marked with the Keyword attribute. Stemmers that respect
keyword attributes will pass through the token so marked without
change. So the effect of this filter would be to index both the
original word and the stemmed version. The 4 stemmers listed above all
respect the keyword attribute.

For terms that are not changed by stemming, this will result in
duplicate, identical tokens in the document. This can be alleviated by
adding the RemoveDuplicatesTokenFilterFactory.

<fieldType name="text_keyword" class="solr.TextField"
positionIncrementGap="100">
 <analyzer>
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   <filter class="solr.KeywordRepeatFilterFactory"/>
   <filter class="solr.PorterStemFilterFactory"/>
   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
</fieldType>"

Reply via email to