On Fri, Jul 30, 2010 at 4:41 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > I'm looking for a list of English words that, when stemmed by Porter stemmer, > end up in the same stem as some similar, but unrelated words. Below are some > examples: > > # this gets stemmed to "iron", so if you search for "ironic", you'll get > "iron" > matches > ironic > > # same stem as animal > anime > animated > animation > animations > > I imagine such a list could be added to the example protwords.txt
+1 No reason to make everyone come up with their own list. Unless a good list already exists out there... we could semi-automate it by running a large corpus through the stemmer and then for each stem, list the original words. The manual part would be looking at the output to see the collisions (unless someone has a better idea). -Yonik http://www.lucidimagination.com