On Sun, 13 Sep 2009 14:19:35 +0100
Clunk Werclick <mailbacku...@googlemail.com> wrote:

> On Sun, 2009-09-13 at 14:06 +0100, RW wrote:
> > On Sun, 13 Sep 2009 06:56:27 +0100
> > Clunk Werclick <mailbacku...@googlemail.com> wrote:
> > 
> {trimmed down to the relevant point you make}
> > Adding irrelevant text to a spam may make it less likely likely to
> > be caught, 
> Thank you. So if your bayes 'good' tokens that happen to catch on this
> 'irrelevant' text, the result of having the bayes is near pointless.
> For example, something like this:

In practise I find it doesn't make much difference unless the spammer
makes a significant effort to reduce the number of spammy tokens, both
in the headers and the body. And that commonly leads them into hitting
other rules, and constrains the number of spams that can be sent from
the same IP address. The majority of the spams I get don't have such
text and most that do still hit BAYES_99. It's obviously not such a
powerful technique as you think.


It's also wrong to assume that when spam hits BAYES_50, BAYES hasn't
done anything useful. This is a fallacy that comes from the arbitrary
assignment of zero to BAYES_50. If you add 2.599 to all the BAYES rules
and than multiply all the rule scores  by 0.658 you get an equivalent
scoreset (i.e. one that produces the same classifications) in which
zero is assigned to BAYES_00 instead. We than have:

 BAYES_00  0.00
 BAYES_50  1.71
 BAYES_99  4.01     

In this scoreset BAYES_50 actually looks like a fairly strong result
(which it is).
 


Reply via email to