Re: Minimum word length for stemming

Jamie Johnson Thu, 31 Jan 2013 16:34:26 -0800

Thanks for confirming my suspicions, the custom
TokenLengthMarkerFilterFactory sounds like the best approach for doing this.



On Thu, Jan 31, 2013 at 5:12 PM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> I believe each stemmer implementation decides that themselves. At least
> the MinimalNorwegianStemmer has a built-in logic which stems certain
> suffixes only if the token is >N chars.
>
> If you want external control, you can look at
> http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming and the
> KeywordMarkerFilterFactory which lets you list a bunch of words you do not
> want the stemmers to touch. I guess you could easily implement your own
> TokenLengthMarkerFilterFactory which keeps words from being stemmed based
> on length.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 31. jan. 2013 kl. 17:35 skrev Jamie Johnson <jej2...@gmail.com>:
>
> > Is there a capability to provide a minimum word threshold that must be
> met
> > before a word is analyzed by a stemmer or other language analyzer?
>
>

Re: Minimum word length for stemming

Reply via email to