I've had a new PHP extension ready to be added into PHP for a while now but
I've never gotten around to offering it up.
Basically, it's a Porter suffix stemmer. We use it at work for a search
engine we're working on, and since we've been using PHP so much and
benefitting from it's open source nature, we've decided to try and give
back a little.
There's only one function in the extension, porter(), which basically takes
in a string and returns it's stem after stripping off the suffix (or
suffixes). The prototype, then, is obviously
string porter(string word)
On success, the function returns word's stem in uppercase. On error, it
returns "-1". Errors only arise when word cannot be stemmed, i.e. it
contains non-sense characters. (Basically any non-alphabetic characters --
anything that isn't [a-zA-Z].)
A quick example:
<?php
print porter("assassin") . "\n";
print porter("assassinate") . "\n";
print porter("assassination") . "\n";
print porter("assassinations") . "\n";
print porter("assassinations111") . "\n";
?>
gives:
ASSASSIN
ASSASSIN
ASSASSIN
ASSASSIN
-1
One problem I can see with the extension -- it's partially written in C++
with an interface written in C so it can talk to PHP. It compiles fine on
the latest 4.1.0 RCs (both 1 and 2) and seems to compile fine with
4.2.0dev. It does make compiling with Apache a bit weird, though -- Apache
will spit out errors about the C++ string library if you're using a C
compiler (which you're pretty much forced to do with Apache). A quick
remedy is to open up $APACHE_HOME/src/Makefile after you get the errors and
change the line that reads "CC=gcc" to "CC=g++" (or whatever your C and C++
compilers are called).
Any interest in the extension? If so, it's up for grabs for inclusion into
any future versions of PHP. If not, it's still up for grabs for anyone who
would like to use it. Just drop me a line.
J
--
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]