* Marc Perkel wrote (12/07/06 18:30):
Catchy subject line eh?
OK - so what I mean by this is that I now use SA for about 5% of all
incoming email. The reaso of spam is rejected before I get to SA through
a fairly large number of tricks that allow me to determine with near
100% accuracy things that are spam. It is none mostly through behavior
and karma related lists. Being host blacklisted or URI blacklisted.
I don't know if it's relevant to Marc's point, but it seems to me that
if SA was reduced to network checks only it would still be a very good
blocker of spam. And perhaps what Marc is doing is, more or less, moving
SA's network checks into the MTA and using them to reject rather than
just score.
I suppose something similar would be to score all the URIBL rules and
RCVD_IN rules high, and abandon the traditional regex rules.
Network checks are easily the most hit spam rules in SA anyway. Here's a
bit of sa-stats for spam on a machine I look after (the MTA blocks based
on sbl-xbl.spamhaus.org before anything gets to SA, so that's not
represented here):
1 BAYES_99
2 URIBL_BLACK
3 URIBL_SBL
4 URIBL_JP_SURBL
5 URIBL_OB_SURBL
6 RCVD_IN_SORBS_DUL
7 RCVD_IN_NJABL_DUL
8 HTML_MESSAGE
9 FORGED_RCVD_HELO
10 URIBL_SC_SURBL
11 URIBL_WS_SURBL
12 SARE_MLB_Stock6
13 URIBL_AB_SURBL
14 SARE_MLB_Stock1
15 STOCK_NAME_FVGT1
Of course that 5% is very important because that is where I get the
data for the other tests that allow me to bypass filtering.
Even this isn't necessarily so. Data for network tests can be collected
automatically, by trapping spammers who trawl the web/usenet for
addresses, those who scan for open port 25s, or those who try high MX's.
So at least some useful data can be collected without SA, or even human
intervention.
But - I
want you all to start thinking of a new way to look at spam
filtering.
I'm not sure this is a "new way to look at spam filtering", but I agree
that content testing against regular expressions is increasingly looking
like a crude and easily-outwitted technique compared to dns tests. Bayes
is still good, though.