Re[2]: SA Problem: spam with random words to defeat Baysian filtering ...

Robert Menschel 13 Feb 2004 05:37:24 -0000

Hello Mark,

Thursday, February 12, 2004, 8:37:12 AM, you wrote:


MAD> If spammers start putting a bunch of "good" words at the end of the
MAD> spam, which some of them seem to be doing, then when you "learn"
MAD> them, won't that screw things up a bit and defeat the whole process?

That certainly seems to be what the spammers are hoping for.

MAD> In this case the rules based checks would be still work, but the Bayes
MAD> checks my offset them.

MAD> Please tell me if I'm misunderstanding this.

1) As already pointed out, Bayes collects information from the headers
and the message body of the spam, as well as the random words. Those are
important fodder for Bayes.

2) The random words always contain plenty of words that do NOT appear in
normal emails. They are therefore not in conflict with ham, and become
good spam sign. As Bayes learns more and more of these truly random
words, they become better and better spam sign.

3) Those few words which are randomly included in this misguided attempt
to confuse Bayes and which actually do occur in normal ham are then known
by Bayes to occur in both ham and spam, with the effect that Bayes will
tend to ignore them when determining that messages with all those other
random words and spam tokens are spam.

I've been feeding ALL such emails to Bayes for three or four months now,
and my experience is that Bayes is working beautifully.

Bob Menschel

Re[2]: SA Problem: spam with random words to defeat Baysian filtering ...

Reply via email to