On 16 Sep 2004 13:39:30 -0700, "Daniel Quinlan" <[EMAIL PROTECTED]>
said:
> Bart Schaefer <[EMAIL PROTECTED]> writes:
> 
> > Feeding the Bayes rules through the scoring algorithm seems to imply a
> > lack of trust in the accuracy of the classifier.
> 
> Mostly not.  It's needed to map from the 0 to 1.0 "probability" to the
> SpamAssassin threshold-based scoring method.  Even in more pure Bayesian
> systems, users still have to figure out where to put stuff into the spam
> bucket and it's often not at 0.50.  Our technique avoids the problem of
> people having two different calibrations.  Plus, there's the lack of
> trust thing, but that's a lesser factor.
> 
> I think we could use a better way to merge Bayesian results into the
> SpamAssassin score, though.

I thought so too... I added the following to my local.cf based on Bayes
scores of spam we receive. Spammers are really trying hard to make their
spams look hammy, but regular users are (hopefully) not trying to make
their hams look spammy. So I weighted the scores in that direction since
my Bayes engine seems much more likely to give my ham a very low score
than to give my spam a very high score. Spammers can fairly easily get
their Bayes scores down to about 50% probability, but it's much more
difficult to get them down below 40% probability since they would have
to know your particular organization's 'hammy' tokens (which would not
remain hammy for long if you're training regularly).

score BAYES_00 -4.9
score BAYES_01 -2.1
score BAYES_10 -1.5
score BAYES_20 -1.0
score BAYES_30 -0.5
score BAYES_40 0.1
score BAYES_44 0.7
score BAYES_50 1.0
score BAYES_56 1.5
score BAYES_60 2.1
score BAYES_70 3.1
score BAYES_80 4.2
score BAYES_90 4.9
score BAYES_99 5.4

-- 
  
  [EMAIL PROTECTED]

Reply via email to