jo3 wrote:
> Hi,
> 
> This is an observation, please take it in the spirit in which it is
> intended, it is not meant to be flame bait.
> 
> After using spamassassin for six solid months, it seems to me that the
> bayes process (sa-learn [--spam | --ham]) has only very limited success
> in learning about new spam.  Regardless of how many spams and hams are
> submitted, the effectiveness never goes above the default level which,
> in our case here, is somewhere around 2 out of 3 spams correctly
> identified.  By the same token, after adding the "third party" rule,
> airmax.cf, the effectiveness went up to 99 out of 100 spams correctly
> identified.


Realistically, I don't know why your hit rates are so low. They shouldn't be so
bad that you're only detecting 2 or 3 out of every hundred.

You could have some configuration problems, but I can't tell as you've not told
us anything about your system, just the problems you have.

Can you answer a few questions that might help us diagnose some of your 
problems:

What version of SA are you running?

Can you post an X-Spam-Status header for one of the false negatives?

Is any of your spam hitting ALL_TRUSTED?

What BAYES rules are these messages hitting before and after training?

Do you use any network checks (URIBLs, RBLs, DCC, Razor, Pyzor, SPF)?


> 
> So far, we have not had a single ham misidentified as spam with over one
> million messages examined.
> 
> Throughout the documentation, there seems to be a bias toward the bayes
> filter rather than the rule system.  Does anyone on the list have some
> thoughts which would help to explain my observation as to why a single
> rule would appear so successful while a million spams and hams would
> have so little effect?
> 

Correction, airmax.cf is not one single rule, it's one single FILE containing
211 rules. That's a significant difference, given that the stock spamassassin
3.1.0 has about 723 rules.

Airmax has increased the number of rules in your system by 29.1%





Reply via email to