I've had a new PHP extension ready to be added into PHP for a while now but 
I've never gotten around to offering it up.

Basically, it's a Porter suffix stemmer. We use it at work for a search 
engine we're working on, and since we've been using PHP so much and 
benefitting from it's open source nature, we've decided to try and give 
back a little.

There's only one function in the extension, porter(), which basically takes 
in a string and returns it's stem after stripping off the suffix (or 
suffixes). The prototype, then, is obviously

    string porter(string word)

On success, the function returns word's stem in uppercase. On error, it 
returns "-1". Errors only arise when word cannot be stemmed, i.e. it 
contains non-sense characters. (Basically any non-alphabetic characters -- 
anything that isn't [a-zA-Z].)

A quick example:

<?php
   print porter("assassin") . "\n";
   print porter("assassinate") . "\n";
   print porter("assassination") . "\n";
   print porter("assassinations") . "\n";
   print porter("assassinations111") . "\n";
?>

gives:

   ASSASSIN
   ASSASSIN
   ASSASSIN
   ASSASSIN
   -1

One problem I can see with the extension -- it's partially written in C++ 
with an interface written in C so it can talk to PHP. It compiles fine on 
the latest 4.1.0 RCs (both 1 and 2) and seems to compile fine with 
4.2.0dev. It does make compiling with Apache a bit weird, though -- Apache 
will spit out errors about the C++ string library if you're using a C 
compiler (which you're pretty much forced to do with Apache). A quick 
remedy is to open up $APACHE_HOME/src/Makefile after you get the errors and 
change the line that reads "CC=gcc" to "CC=g++" (or whatever your C and C++ 
compilers are called).

Any interest in the extension? If so, it's up for grabs for inclusion into 
any future versions of PHP. If not, it's still up for grabs for anyone who 
would like to use it. Just drop me a line.

J

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to