* Marc Perkel wrote (12/07/06 18:30):
Catchy subject line eh?

OK - so what I mean by this is that I now use SA for about 5% of all incoming email. The reaso of spam is rejected before I get to SA through a fairly large number of tricks that allow me to determine with near 100% accuracy things that are spam. It is none mostly through behavior and karma related lists. Being host blacklisted or URI blacklisted.

I don't know if it's relevant to Marc's point, but it seems to me that if SA was reduced to network checks only it would still be a very good blocker of spam. And perhaps what Marc is doing is, more or less, moving SA's network checks into the MTA and using them to reject rather than just score.

I suppose something similar would be to score all the URIBL rules and RCVD_IN rules high, and abandon the traditional regex rules.

Network checks are easily the most hit spam rules in SA anyway. Here's a bit of sa-stats for spam on a machine I look after (the MTA blocks based on sbl-xbl.spamhaus.org before anything gets to SA, so that's not represented here):

   1    BAYES_99
   2    URIBL_BLACK
   3    URIBL_SBL
   4    URIBL_JP_SURBL
   5    URIBL_OB_SURBL
   6    RCVD_IN_SORBS_DUL
   7    RCVD_IN_NJABL_DUL
   8    HTML_MESSAGE
   9    FORGED_RCVD_HELO
  10    URIBL_SC_SURBL
  11    URIBL_WS_SURBL
  12    SARE_MLB_Stock6
  13    URIBL_AB_SURBL
  14    SARE_MLB_Stock1
  15    STOCK_NAME_FVGT1


Of course that 5% is very important because that is where I get the
data for the other tests that allow me to bypass filtering.

Even this isn't necessarily so. Data for network tests can be collected automatically, by trapping spammers who trawl the web/usenet for addresses, those who scan for open port 25s, or those who try high MX's. So at least some useful data can be collected without SA, or even human intervention.

But - I
want you all to start thinking of a new way to look at spam
filtering.

I'm not sure this is a "new way to look at spam filtering", but I agree that content testing against regular expressions is increasingly looking like a crude and easily-outwitted technique compared to dns tests. Bayes is still good, though.

Reply via email to