Justin Mason wrote:
Hey --
just to turn the tables for a bit ;), I've recently been considering a
problem and a possible solution, and could do with SpamAssassin users'
advice.
These days, I've been forced to use SBL/XBL as an upfront anti-spam check,
rejecting spam at RCPT TO: time during the SMTP transaction. (Previously
I'd been running it from SpamAssassin in the usual manner.) That's great,
and it works well, rejecting a *lot* of spam and saving a lot of CPU time
by not running SpamAssassin. ;)
However: it's important for SpamAssassin developers and mass-checkers to
get a "representative" feed of spam -- with all kinds of spam included --
so that the rules are measured against something close to reality. This,
unfortunately, implies that discarding mails that hit SBL/XBL is a bad
thing, since those mails won't get into the mass-checked corpora -- and
what will be mass-checked from that point on is just the 25% of spam that
evades those rules.
Bug 5096 suggests that we replace some of the mass-check corpora with
pure-spamtrap feeds to fix this. Bit of a heavy fix :(
There's another way, though. If it were possible to change the SMTP
transaction flowchart to include this:
- is IP listed in SBL/XBL?
- if not listed, deliver as normal;
- else if listed, continue SMTP transaction as if normal delivery is
underway, but deliver to a spamtrap mbox file or maildir.
CAVEAT: just because the client is listed on sbl-xbl does not mean the
message is spam. In particular:
- a legit user may be sending through a listed server.
- a spammer may "corpus-corrupt" you by sending ham messages (slightly
modified copies from mailing lists)
you can of course consider that the first is not a critical issue
(statistically talking at least). but if spammers know what you're
doing, the second point may become an issue (this is true with
spamtraps, I don't know why spammers don't do it...).