Andrew Schulman wrote:
> I'm running spamc/spamd 3.0.2 in Debian.  I have Bayesian tests turned on,
> and network tests off.

I am running a similar system.  But with network tests turned on.  The
network tests such as SURBL[1] are huge factors in increasing spam
classification accuracy for me.

> almost all of the spam is tagged as BAYES_95 or BAYES_99.  My score
> threshold is 5, the BAYES_99 test alone (using its default value) is
> worth 4.07, and a few other tests are usually positive as
> well.  Yet, the total score is around 2.5.

Of course as you are aware there are four scores.

           The first score is used when both Bayes and network tests
           are disabled (score set 0). The second score is used when
           Bayes is disabled, but network tests are enabled (score set
           1). The third score is used when Bayes is enabled and
           network tests are disabled (score set 2). The fourth score
           is used when Bayes is enabled and network tests are enabled
           (score set 3).

The default for BAYES_99 in SA-3.0.2 is:

  score BAYES_99 0 0 4.070 1.886

I fell to confusion on this exact thing debugging a problem of mine a
while ago.  I thought I was using one column but was really getting
data from the other.

What is the output of this on your mesages?

  spamassassin -tD 2>&1 | pager

What value does it show for BAYES_99 in the content analysis section?
If it says something other than 4.07 then it confirms that you are not
running with values from column four network test off.  It sounds
instead like you are running with network tests enables.  Are network
tests enabled in the debugging output?

> I understand that the individual test scores are fed through a neural
> network to derive the final score.  So it seems that this network has
> started to behave badly.  

Because you are getting the BAYES_99 tag I am sure the bayes engine is
working properly.  You are seeing a scoring difference instead.

> Can anyone shed any light on this?  Is it a well-known problem?  What's the
> preferred way to address it?  Remove all of SA's learned information and
> retrain the network?

Don't retrain!  I am convinced by your evidence that you are actually
running with network tests enables.  Compare the result with the
following.  Does this give you the results you were looking for?

  spamassassin -L -tD 2>&1 | pager

Bob

[1] http://www.surbl.org/

Reply via email to