Regarding stemmers, I ditched them altogether a long time ago in favor
of a dictionary of morphologies of all known words (for any given
language). A simple lookup of any word morphology thus produces the set,
including the correct stem.

Works great. 100% of the time.

Just a tip from me.


On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:

> Andy, I think it is important to know what a stemmer really is.
> 
> It reduces words to their infinitves. Those infinitives do not refer to the
> real infinitive everytime, but however: for the system, it is an infinitive,
> since all its derivates could be reduced to the same form.
> Thats a stemmer.
> 
> According to this, there can't exist a stemmer for every language, because
> every language has got its own rules of how to reduce a word to its
> infinitive.
> 
> If you apply a stemmer for english language on a german document, the
> results might be unexpected. However, sometimes it still works good enough. 
> 
> Keep in mind that this is an algorithm. It is not important whether the
> created infinitive is the real infinitive. It is only important that most of
> the derivate forms can be reduced to the same basic form. Please ask, if
> something is not clear.
> 
> KStem:
> The wiki[1] says that KStem is less aggressive as the standard stemmer.
> I guess that this means that there are more rules for how to reduce a word
> to its infinitive and according to this the results might be better.
> 
> 
> [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> Kind regards
> - Mitch


Reply via email to