SpamTalk <[EMAIL PROTECTED]> writes: > Somewhere in the not very distant future SA is going to have to: > > A) render HTML to text ala LYNX
We already include our own HTML renderer (designed for spam filtering, not pretty output). > B) run the rendered text through a grammar check, I assume that there is an > open source analyzer available. Not really. > C) have the GA establish a Bayesian baseline of grammar scores indicative of > SPAM/HAM. Spelling might be more feasible, but I wouldn't hold my breath. > Buy tracking the overall grammar score in addition to the actual content SA > should be able to recognize random word strings as indicative of spam and > apply additional penalty points. You'd think, but it doesn't really work all that well. Too many false positives and spammers could easily shift to non-random hash busters. Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, and open http://www.pathname.com/~quinlan/ source consulting (looking for new work) ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk