Which brings up another point which has been mentioned on the list before -- the BAYES_99 score is too low for well-trained systems.
I have never seen a BAYES_99 hit on any non-spam.
Yeah, it's kind of suspect.. take a look at the STATISTICS.txt data for set3 and set2.
Notice that in set3 the nonspam hit rate is quite low, but it's 10x higher than in set2 as a percentage of the total nonspam corpus...]
Quite frankly, I suspect corpus pollution. It really only takes 1 high scoring spam in the nonspam corpus to really screw up the message scores.
Things in general I find suspect about the STATISTICS-set*.txt files for 3.x:
1) DRUGS_PAIN_OBFU actually hit some nonspam? I find that odd, but it could be a typo.
2) DRUGS_SMEAR1 hit some nonspam? I find that damn near impossible. I don't think any nonspam email other than one quoting spam will ever hit that rule. It seems there's one drug spam, or drug spam quote in somebody's corpus, and it was run in all 4 sets. (If anyone can show me the nonspam matching that rule and it's not spam or a spam quote or discussion of SA's rules, I'll send em $20. Really.)
3) Hugely better bayes performance in set2 compared to set3. Factor of 10 difference in FP rate for BAYES_90 and higher. Admittedly overall hits are up, but not that much..
# grep BAYES_9 STATISTICS-set2.txt 35.784 73.4212 0.0034 1.000 0.98 4.07 BAYES_99 1.483 3.0402 0.0030 0.999 0.87 3.61 BAYES_90 1.173 2.4030 0.0030 0.999 0.85 3.51 BAYES_95
# grep BAYES_9 STATISTICS-set3.txt 43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99 0.805 1.6326 0.0202 0.988 0.70 2.06 BAYES_95 0.913 1.8399 0.0343 0.982 0.64 2.09 BAYES_90
4) NIGERIAN_BODY3? could be a finance newsletter, but very unlikely.
5) HARDCORE_PORN? hmmm.. possible.. Unlikely, but "extreme hardcore gaming" would match it.
6) PERCENT_RANDOM? Very unlikely. What would have %rnd_x in it?