On Sun, 13 Sep 2009 14:19:35 +0100 Clunk Werclick <mailbacku...@googlemail.com> wrote:
> On Sun, 2009-09-13 at 14:06 +0100, RW wrote: > > On Sun, 13 Sep 2009 06:56:27 +0100 > > Clunk Werclick <mailbacku...@googlemail.com> wrote: > > > {trimmed down to the relevant point you make} > > Adding irrelevant text to a spam may make it less likely likely to > > be caught, > Thank you. So if your bayes 'good' tokens that happen to catch on this > 'irrelevant' text, the result of having the bayes is near pointless. > For example, something like this: In practise I find it doesn't make much difference unless the spammer makes a significant effort to reduce the number of spammy tokens, both in the headers and the body. And that commonly leads them into hitting other rules, and constrains the number of spams that can be sent from the same IP address. The majority of the spams I get don't have such text and most that do still hit BAYES_99. It's obviously not such a powerful technique as you think. It's also wrong to assume that when spam hits BAYES_50, BAYES hasn't done anything useful. This is a fallacy that comes from the arbitrary assignment of zero to BAYES_50. If you add 2.599 to all the BAYES rules and than multiply all the rule scores by 0.658 you get an equivalent scoreset (i.e. one that produces the same classifications) in which zero is assigned to BAYES_00 instead. We than have: BAYES_00 0.00 BAYES_50 1.71 BAYES_99 4.01 In this scoreset BAYES_50 actually looks like a fairly strong result (which it is).