jo3 wrote: > Hi, > > This is an observation, please take it in the spirit in which it is > intended, it is not meant to be flame bait. > > After using spamassassin for six solid months, it seems to me that the > bayes process (sa-learn [--spam | --ham]) has only very limited success > in learning about new spam. Regardless of how many spams and hams are > submitted, the effectiveness never goes above the default level which, > in our case here, is somewhere around 2 out of 3 spams correctly > identified. By the same token, after adding the "third party" rule, > airmax.cf, the effectiveness went up to 99 out of 100 spams correctly > identified.
Realistically, I don't know why your hit rates are so low. They shouldn't be so bad that you're only detecting 2 or 3 out of every hundred. You could have some configuration problems, but I can't tell as you've not told us anything about your system, just the problems you have. Can you answer a few questions that might help us diagnose some of your problems: What version of SA are you running? Can you post an X-Spam-Status header for one of the false negatives? Is any of your spam hitting ALL_TRUSTED? What BAYES rules are these messages hitting before and after training? Do you use any network checks (URIBLs, RBLs, DCC, Razor, Pyzor, SPF)? > > So far, we have not had a single ham misidentified as spam with over one > million messages examined. > > Throughout the documentation, there seems to be a bias toward the bayes > filter rather than the rule system. Does anyone on the list have some > thoughts which would help to explain my observation as to why a single > rule would appear so successful while a million spams and hams would > have so little effect? > Correction, airmax.cf is not one single rule, it's one single FILE containing 211 rules. That's a significant difference, given that the stock spamassassin 3.1.0 has about 723 rules. Airmax has increased the number of rules in your system by 29.1%