-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Theo Van Dinter wrote:
[snip]
> Bayes is good at catching words which are spam/ham for you, make sure to
> learn those mails.  SA will work better for people who tune it to the
> mail they receive though -- add your own rules for words and phrases
> you consider spam, generate your own scoreset from your own corpus, etc.
> 

As if to reinforce Theo's comments..
By coincidence, last night I ran my monthly sa-stats for all the mail
here since May. There are only 2 accounts on this system (me and the
wife), but below is an indicator of just how right Theo is...

Figures from 8231 spam and 29181 ham messages:

Bayes:
BAYES_99 gives me 4.0 (not default) and BAYES_00 gives me -2.599. I
*love* bayes.
RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
BAYES_99                         7636  7.17    20.41   92.77    0.40
BAYES_00                        23435 26.79    62.64    0.41   80.31

Local rules for spammy words (not huge hitters, but relevant):
RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
LOCAL_419_ACCOUNT                1094     1.03    2.92   13.29    0.09
LOCAL_NEXT_OF_KIN                 856     0.80    2.29   10.40    0.01
LOCAL_419_BENEFICIARY             734     0.69    1.96    8.92    0.00
LOCAL_LOT_APPROVED                672     0.63    1.80    8.16    0.00

Tuning your scoreset from real mail:
HTML_MESSAGE scores at 0.001, which is good because it hits 10% of my ham...
HTML_MESSAGE                     3094     2.91    8.27   37.59    9.47

FORGED_RCVD_HELO scores at 0.135 in the current mode. Again, lucky it's low:
FORGED_RCVD_HELO                 1829     1.72    4.89   22.22    8.62

Next time you see a thread about the spamcop.net blacklist hitting too
much ham (scoring 1.332/1.558):
RCVD_IN_BL_SPAMCOP_NET           4145     3.89   11.08   50.36    1.05

And a justification (perhaps) for having a "proper" name set up in your
mail client (many of the ham hits are from fedora-users and
spamassassin-users):
NO_REAL_NAME                     1251     1.43    3.34   15.89    4.29

Anyway, Just some data to back up the rhetoric ;-)

Cheers!
C.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDiafhMDDagS2VwJ4RAkHlAKDjqaQb3LwXzxTm9UnmkxkhIay6SACg8ZAZ
5IKFsqgjdOTgHUWUL/I1/OU=
=vE2o
-----END PGP SIGNATURE-----

Reply via email to