SpamTalk <[EMAIL PROTECTED]> writes:

> Somewhere in the not very distant future SA is going to have to:
> 
> A) render HTML to text ala LYNX

We already include our own HTML renderer (designed for spam filtering,
not pretty output).
 
> B) run the rendered text through a grammar check, I assume that there is an
> open source analyzer available.

Not really.
 
> C) have the GA establish a Bayesian baseline of grammar scores indicative of
> SPAM/HAM.

Spelling might be more feasible, but I wouldn't hold my breath.
 
> Buy tracking the overall grammar score in addition to the actual content SA
> should be able to recognize random word strings as indicative of spam and
> apply additional penalty points.

You'd think, but it doesn't really work all that well.  Too many false
positives and spammers could easily shift to non-random hash busters.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and open
http://www.pathname.com/~quinlan/   source consulting (looking for new work)


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to