It's a great little tool if you're into search engines and that sort of 
thing, as word stemming has been quite a useful tool in the field of 
information retrieval. (Actually, I've built a search engine at work, which 
used an earlier stemmer I wrote based on Porter's original article using 
regexes and such. It could only handle English, so after looking at 
snowball, I decided to expand our search engine's capabilities a bit, hence 
the new extension.)

The number of languages that can be supported is pretty much wide open. All 
someone would need to do is write a stemmer in Snowball, add the ANSI C it 
outputs to the extension and add a few constants and a line or two of code 
to the extension itself. I haven't looked around for any other languages 
besides those found on the Snowball site, which are all under BSD-like 
licenses, although I'm sure there are more out there. Anybody who finds a 
new one should let me know.

I haven't really looked at PECL, but after reading a bit on it, that's 
probably where this extension should go. If I'm understanding the terms of 
the licenses allowed in PECL, any OSS license will do, so I'm thinking BSD 
license, just to follow the Snowball license.

I'll clean the extension up a bit and offer it to the library.

J
 


James Cox wrote:

> well, this definitely looks cool, from a language point of view.
> 
> i would go for new_stem or such like, and expect the language to be
> determined as a variable.
> 
> I hope this allows for more work on various language features... perhaps
> you'd want to spend time looking at what else is available.
> 
> one final note, is that you may wish to put this in the PEAR PECL library,
> since it's a: a pretty exclusive extension, and b: that's where it should
> go. :)
> 
> James
> 


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to