Hi
all,
Bayes seems to
be missing quite a lot of spam. I'm getting these results quite
often:
TOP SPAM RULES
FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 HTML_MESSAGE 2127 68.95 75.29 62.00
2 URIBL_BLACK 1822 39.09 64.50 11.22
3 URIBL_SC_SURBL 1664 31.07 58.90 0.54
4 URIBL_OB_SURBL 1654 31.61 58.55 2.06
5 BAYES_00 1471 65.28 52.07 79.77
6 URIBL_SBL 1360 29.81 48.14 9.70
7 URIBL_WS_SURBL 922 17.37 32.64 0.62
8 AWL 911 42.81 32.25 54.39
9 URIBL_AB_SURBL 746 13.89 26.41 0.16
10 BAYES_99 707 13.09 25.03 0.00
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 HTML_MESSAGE 2127 68.95 75.29 62.00
2 URIBL_BLACK 1822 39.09 64.50 11.22
3 URIBL_SC_SURBL 1664 31.07 58.90 0.54
4 URIBL_OB_SURBL 1654 31.61 58.55 2.06
5 BAYES_00 1471 65.28 52.07 79.77
6 URIBL_SBL 1360 29.81 48.14 9.70
7 URIBL_WS_SURBL 922 17.37 32.64 0.62
8 AWL 911 42.81 32.25 54.39
9 URIBL_AB_SURBL 746 13.89 26.41 0.16
10 BAYES_99 707 13.09 25.03 0.00
To me, it looks like
Bayes_00 is hitting far too much spam.
I have fed a large
amount of mail into Bayes:
[EMAIL PROTECTED] ~]#
sa-learn --dump
magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 6468 0 non-token data: nspam
0.000 0 6471 0 non-token data: nham
0.000 0 160969 0 non-token data: ntokens
0.000 0 1150774613 0 non-token data: oldest atime
0.000 0 1153439019 0 non-token data: newest atime
0.000 0 1153436831 0 non-token data: last journal sync atime
0.000 0 1153426735 0 non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire atime delta
0.000 0 97882 0 non-token data: last expire reduction count
0.000 0 3 0 non-token data: bayes db version
0.000 0 6468 0 non-token data: nspam
0.000 0 6471 0 non-token data: nham
0.000 0 160969 0 non-token data: ntokens
0.000 0 1150774613 0 non-token data: oldest atime
0.000 0 1153439019 0 non-token data: newest atime
0.000 0 1153436831 0 non-token data: last journal sync atime
0.000 0 1153426735 0 non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire atime delta
0.000 0 97882 0 non-token data: last expire reduction count
And I'm quite
certain that it was fed correctly.
All of the misses I
have checked have hit Bayes_00.
Any ideas why this
is happening? I have toyed with the idea of lowering the bayes_00 score. Anyone
care to enlighten me on whether this would be a bad idea and
why?
Regards,
Leigh
Leigh
Leigh Sharpe
Network Systems Engineer
Pacific Wireless
Ph +61 3 9584 8966
Mob 0408 009 502
email [EMAIL PROTECTED]
web www.pacificwireless.com.au
Network Systems Engineer
Pacific Wireless
Ph +61 3 9584 8966
Mob 0408 009 502
email [EMAIL PROTECTED]
web www.pacificwireless.com.au
