It's a great little tool if you're into search engines and that sort of thing, as word stemming has been quite a useful tool in the field of information retrieval. (Actually, I've built a search engine at work, which used an earlier stemmer I wrote based on Porter's original article using regexes and such. It could only handle English, so after looking at snowball, I decided to expand our search engine's capabilities a bit, hence the new extension.)
The number of languages that can be supported is pretty much wide open. All someone would need to do is write a stemmer in Snowball, add the ANSI C it outputs to the extension and add a few constants and a line or two of code to the extension itself. I haven't looked around for any other languages besides those found on the Snowball site, which are all under BSD-like licenses, although I'm sure there are more out there. Anybody who finds a new one should let me know. I haven't really looked at PECL, but after reading a bit on it, that's probably where this extension should go. If I'm understanding the terms of the licenses allowed in PECL, any OSS license will do, so I'm thinking BSD license, just to follow the Snowball license. I'll clean the extension up a bit and offer it to the library. J James Cox wrote: > well, this definitely looks cool, from a language point of view. > > i would go for new_stem or such like, and expect the language to be > determined as a variable. > > I hope this allows for more work on various language features... perhaps > you'd want to spend time looking at what else is available. > > one final note, is that you may wish to put this in the PEAR PECL library, > since it's a: a pretty exclusive extension, and b: that's where it should > go. :) > > James > -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php