Hello Mark, Thursday, February 12, 2004, 8:37:12 AM, you wrote:
MAD> If spammers start putting a bunch of "good" words at the end of the MAD> spam, which some of them seem to be doing, then when you "learn" MAD> them, won't that screw things up a bit and defeat the whole process? That certainly seems to be what the spammers are hoping for. MAD> In this case the rules based checks would be still work, but the Bayes MAD> checks my offset them. MAD> Please tell me if I'm misunderstanding this. 1) As already pointed out, Bayes collects information from the headers and the message body of the spam, as well as the random words. Those are important fodder for Bayes. 2) The random words always contain plenty of words that do NOT appear in normal emails. They are therefore not in conflict with ham, and become good spam sign. As Bayes learns more and more of these truly random words, they become better and better spam sign. 3) Those few words which are randomly included in this misguided attempt to confuse Bayes and which actually do occur in normal ham are then known by Bayes to occur in both ham and spam, with the effect that Bayes will tend to ignore them when determining that messages with all those other random words and spam tokens are spam. I've been feeding ALL such emails to Bayes for three or four months now, and my experience is that Bayes is working beautifully. Bob Menschel
