Serious bayesian filter problems

Johann Spies 10 May 2004 08:39:01 -0000

I happened the first time on 26-28 April: the bayesian filter began to
give a lot of false positives.


I replaced the bayesian database on the that server with that of a
second one.  Apart from auto-learning I run sa-learn at least once a
day feeding them the same input. 

These mail servers accept about 100000 emails per day and together
stop about 25 000 unwanted messages (most of it spam) at smtp-level.

Yesterday the same thing happened on the second server: the bayesian
filter/auto whitelisting combination started to give false positives:
The same message tested with spamc scored a 9.3 on the one server and
2.0 on the other (which is about what it should be). I then used
sa-learn to learn the message as ham on the first one and tested it
again: 9.4!  The threshold is 8.0.

As a result of this behaviour I even received  a warning from the
spamassassin mailing list server that messages sent to me bounced.

It might be both auto-whitelisting and bayesian corruption. 

I can not afford unreliable software to do this important job.  Am I
the only one who experience this type of behaviour?

How can I prevent this?  I can not watch spamassassin 24 hours per day
to jump in when something goes wrong.  

Regards
Johann

-- 
Johann Spies          Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

     "My son, do not despise the LORD's discipline and do
      not resent his rebuke, because the LORD disciplines
      those he loves, as a father the son he delights in."
                                       Proverbs 3:11,12

signature.asc
Description: Digital signature

Serious bayesian filter problems

Reply via email to