On Fri, 13 Feb 2004, Justin Mason wrote:

> From: Justin Mason <[EMAIL PROTECTED]>
> Date: Fri, 13 Feb 2004 16:05:02 -0800
> Subject: Re: Some real anti-bayes stuffing
> To: Mark A. DeMichele <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> X-Spam-Status: No, hits=-12.9 required=5.0 tests=AWL,BAYES_00,HABEAS_SWE
>       autolearn=ham version=2.60
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Mark A. DeMichele writes:
> > This is exactly what I was talking about in a previous post.
> >
> > If spammers start doing this and they make sure this bogus section is
> > larger than the actual spam section, the bayes filter will probably mark
> > it as ham.  What's worst is if you then force the bayes filter to learn
> > this as spam, now you just increased the spam score for each of these
> > good words.  If this happens over and over again, I would imagine that
> > the bayes filter would malfunction.  At least that's my opinion, but
> > feel free to disagree.
>
> Good, will do.
>
> The idea of bayes is that you train it on
>
>   1. *YOUR* ham
>   2. *YOUR* spam
>
> Unless spammers figure out what *YOU* call ham, they can add random words,
> bits of Russian literature, snippets of Tom Sawyer until the cows come
> home.
>

        Exactly - this has been my experience as well.  These attempts to
circumvent/poison bayes have been totally ineffective - at least in our
case.

> In the worst case, they'll find one or two strong ham-sign words -- like
> 'Kits', or 'entries' (for my corpus).  Worst case?  I retrain on their
> mail, and those tokens become about even ham and spam counts, 0.5
> probability, and are *ignored* by the Bayes calculation in future.
>

        :)

> *PLEASE* read up on how Bayes works.   READ John Graham-Cumming's
> presentation from the last Spam Conf, and NOTICE how it took him thousands
> of iterations of bayes-poisoning, sending a mail each time with a direct
> feedback loop, to get a single spam through.
>

        Well said.  It does help to understand exactly what bayes does.

-- 
Jon Trulson    mailto:[EMAIL PROTECTED]
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include <std/disclaimer.h>
"I am Nomad." -Nomad

Reply via email to