Hi all,
It has been a long time I had not uploaded any stack to RevOnline...
This new one is very specific:
Stemm Lib
Title: Stemmer Library
Category: Utilities
Description:
English , French, Italian, Spanish, German and Portuguese stemmers.
English stemmer originally written by Ken Ray, others by Eric Chatonet.
Porter algorithms are very handy to automatically isolate the stem of
a word (that is, the main part of a word to which affixes are added).
However they are known not to be 100% reliable. To address this
issue, I adopted the following approach:
Words are first checked against the list of words known to be parsed
incorrectly (that is, incorrectly parsed when applying the algorithm
on a corpus of 20 000 forms). If a match is found with an item from
this list, the stem is defined by simple dictionary lookup. If not
match is found, then the stem is defined using Porter's algorithm.
With this approach, reliability was found to be higher than 99% for
each one of the six languages :-)
Explanations on how to use this stack are in the lib itself.
Thanks to Marielle for having edited this description :-)
To find it: Username: sosmartsoftware
Best Regards from Paris,
Eric Chatonet
------------------------------------------------------------------------
----------------------
http://www.sosmartsoftware.com/ [EMAIL PROTECTED]/
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution