Re: Some real anti-bayes stuffing

Justin Mason 14 Feb 2004 00:05:21 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark A. DeMichele writes:
> This is exactly what I was talking about in a previous post.
> 
> If spammers start doing this and they make sure this bogus section is
> larger than the actual spam section, the bayes filter will probably mark
> it as ham.  What's worst is if you then force the bayes filter to learn
> this as spam, now you just increased the spam score for each of these
> good words.  If this happens over and over again, I would imagine that
> the bayes filter would malfunction.  At least that's my opinion, but
> feel free to disagree.

Good, will do.

The idea of bayes is that you train it on

  1. *YOUR* ham
  2. *YOUR* spam

Unless spammers figure out what *YOU* call ham, they can add random words,
bits of Russian literature, snippets of Tom Sawyer until the cows come
home.

For spammers to effectively "poison" bayes, they need to figure out what
kind of text *YOU* have trained on.   If I don't receive copies of Tom
Sawyer by email normally, then sure, 19th-century US lit will become a
spam sign.

But I don't care because *I DON'T* receive copies of Tom Sawyer by email,
normally.  (I reserve that honour for snailmail, or occasionally by FTP.)
So it's not going to wind up misclassifying anything as a result.

In the worst case, they'll find one or two strong ham-sign words -- like
'Kits', or 'entries' (for my corpus).  Worst case?  I retrain on their
mail, and those tokens become about even ham and spam counts, 0.5
probability, and are *ignored* by the Bayes calculation in future.

*PLEASE* read up on how Bayes works.   READ John Graham-Cumming's
presentation from the last Spam Conf, and NOTICE how it took him thousands
of iterations of bayes-poisoning, sending a mail each time with a direct
feedback loop, to get a single spam through.

The sky is NOT falling, guys!

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFALWYuQTcbUG5Y7woRAuzOAKCRRNOx7r2SD/PpyKRAIcthNsC9JgCg7drd
468mo+BQ7BGH/Ix5OfEXg/E=
=w6ao
-----END PGP SIGNATURE-----

Re: Some real anti-bayes stuffing

Reply via email to