-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Theo Van Dinter wrote: [snip] > Bayes is good at catching words which are spam/ham for you, make sure to > learn those mails. SA will work better for people who tune it to the > mail they receive though -- add your own rules for words and phrases > you consider spam, generate your own scoreset from your own corpus, etc. >
As if to reinforce Theo's comments.. By coincidence, last night I ran my monthly sa-stats for all the mail here since May. There are only 2 accounts on this system (me and the wife), but below is an indicator of just how right Theo is... Figures from 8231 spam and 29181 ham messages: Bayes: BAYES_99 gives me 4.0 (not default) and BAYES_00 gives me -2.599. I *love* bayes. RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM BAYES_99 7636 7.17 20.41 92.77 0.40 BAYES_00 23435 26.79 62.64 0.41 80.31 Local rules for spammy words (not huge hitters, but relevant): RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM LOCAL_419_ACCOUNT 1094 1.03 2.92 13.29 0.09 LOCAL_NEXT_OF_KIN 856 0.80 2.29 10.40 0.01 LOCAL_419_BENEFICIARY 734 0.69 1.96 8.92 0.00 LOCAL_LOT_APPROVED 672 0.63 1.80 8.16 0.00 Tuning your scoreset from real mail: HTML_MESSAGE scores at 0.001, which is good because it hits 10% of my ham... HTML_MESSAGE 3094 2.91 8.27 37.59 9.47 FORGED_RCVD_HELO scores at 0.135 in the current mode. Again, lucky it's low: FORGED_RCVD_HELO 1829 1.72 4.89 22.22 8.62 Next time you see a thread about the spamcop.net blacklist hitting too much ham (scoring 1.332/1.558): RCVD_IN_BL_SPAMCOP_NET 4145 3.89 11.08 50.36 1.05 And a justification (perhaps) for having a "proper" name set up in your mail client (many of the ham hits are from fedora-users and spamassassin-users): NO_REAL_NAME 1251 1.43 3.34 15.89 4.29 Anyway, Just some data to back up the rhetoric ;-) Cheers! C. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDiafhMDDagS2VwJ4RAkHlAKDjqaQb3LwXzxTm9UnmkxkhIay6SACg8ZAZ 5IKFsqgjdOTgHUWUL/I1/OU= =vE2o -----END PGP SIGNATURE-----