Porter Stemmer

Henrik Sarvell Tue, 08 Dec 2009 13:51:28 -0800

I'm looking into implementing a more advanced similarity algorithm
than what I have now for comparing text. I'm going to use various
wordstat files that categorizes words, now a lot of the words in these
files have already been stemmed, ie reduced to their common "stem",
for instance INFINIT*.


So I need to stem the input texts that are to be evaluated with the
help of the wordstat files and I found a really good resource:
http://tartarus.org/~martin/PorterStemmer/

If you for instance compare
http://tartarus.org/~martin/PorterStemmer/ruby.txt with the common
lisp version http://tartarus.org/~martin/PorterStemmer/commonlisp.txt
it can be seen that a language like Ruby which has some of its
influences from Perl will handle the task elegantly in very few lines
whereas the CL version is much more bulky.

At first I thought it would be a fairly effortless task to code this
up in PIL but now I'm not so sure, what say you guys? Right now I'm
inclined to simply use the C version.

/Henrik
-- 
UNSUBSCRIBE: mailto:[email protected]?subject=unsubscribe

Porter Stemmer

Reply via email to