RE: Bayesing Filtering needs to be rewritten

R Lists06 Wed, 11 Oct 2006 12:55:20 -0700

> One of the problems now with bayes is that image spam is causing bayes
> to be useless. We need a new plan to avoid bayes poisoning. Poisoning is
> caused when messages are learned where the text of the message is a
> nonspam type text and the spam is in the image.
> 
> Bayes needs to be smarter about what text it learns and know when to not
> learn the text. We need logic that says the the text is a trick, only
> learn the headers.
> 
> In general a lot of text isn't that strong of an indicator of spam or
> nonspam. Things like URLs and email addresses and phone numbers are good
> indicators as well as the HTML tags. And the headers are of course the
> best part. I question if using the whole message is best. I think we
> should parse the message for what I'll call "fingerprint tokens" which
> are tokens that can be used to ID similar messages.
> 
> Thoughts on avoiding bayes poisoning and looking for fingerprint tokens?


The only thought that comes to mind would be code that says, IF email has an
attachment of such and such a type, then do not autolearn and/or send it to
other conditionals

???

 - rh

--
Robert - Abba Communications
   Computer & Internet Services
 (509) 624-7159 - www.abbacomm.net

RE: Bayesing Filtering needs to be rewritten

Reply via email to