Hi there,
I am running a community site in php, where members can be contacted through a web form. Yesterday a guy contacted about 50 of those members with a spam text.
Now I am trying to find a way to identify spam text via php. This looks like a common task to me, so I hope that I have not to invent the wheel twice. Maybe someone knows a good script to do this?
Basicly it looks like that the text has to be checked against certain key words and if they reach a certain amount of hits it is spam text.
Has anybody a good idea on how to start on this?
Thank you for any hint,
I suppose the difficulty here depends on how narrow this 'community' is. If it's small and narrowly focussed, you could probably get away with using PHP's string searching or regexp features to search for 'bad' words, or search for a few terms that must show up in messages, or both.
For more general use, you might be able to use a filter like SpamAssassin -
http://www.spamassassin.org/index.html
It looks like it has sufficient flexibility & APIs to use it for general text analysis purposes, not just as a mail filter. My employer uses it on their central mail servers, and it does a pretty good job of rating incoming mail for its 'spamesque' qualities.
SpamBayes is another filter -
http://spambayes.sourceforge.net/
Spamassassin is written in Perl, SpamBayes in Python, so you wouldn't have a pure-PHP solution, though, if that was really important to you -
steve
-- +--------------- my people are the people of the dessert, ---------------+ | Steve Edberg [EMAIL PROTECTED] | | University of California, Davis (530)754-9127 | | Programming/Database/SysAdmin http://pgfsun.ucdavis.edu/ | +---------------- said t e lawrence, picking up his fork ----------------+
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php