mouss writes: > Justin Mason wrote: > > Hey -- > > > > just to turn the tables for a bit ;), I've recently been considering a > > problem and a possible solution, and could do with SpamAssassin users' > > advice. > > > > These days, I've been forced to use SBL/XBL as an upfront anti-spam check, > > rejecting spam at RCPT TO: time during the SMTP transaction. (Previously > > I'd been running it from SpamAssassin in the usual manner.) That's great, > > and it works well, rejecting a *lot* of spam and saving a lot of CPU time > > by not running SpamAssassin. ;) > > > > However: it's important for SpamAssassin developers and mass-checkers to > > get a "representative" feed of spam -- with all kinds of spam included -- > > so that the rules are measured against something close to reality. This, > > unfortunately, implies that discarding mails that hit SBL/XBL is a bad > > thing, since those mails won't get into the mass-checked corpora -- and > > what will be mass-checked from that point on is just the 25% of spam that > > evades those rules. > > > > Bug 5096 suggests that we replace some of the mass-check corpora with > > pure-spamtrap feeds to fix this. Bit of a heavy fix :( > > > > There's another way, though. If it were possible to change the SMTP > > transaction flowchart to include this: > > > > - is IP listed in SBL/XBL? > > - if not listed, deliver as normal; > > - else if listed, continue SMTP transaction as if normal delivery is > > underway, but deliver to a spamtrap mbox file or maildir. > > > > CAVEAT: just because the client is listed on sbl-xbl does not mean the > message is spam. In particular: > - a legit user may be sending through a listed server. > - a spammer may "corpus-corrupt" you by sending ham messages (slightly > modified copies from mailing lists) > > you can of course consider that the first is not a critical issue > (statistically talking at least). but if spammers know what you're > doing, the second point may become an issue (this is true with > spamtraps, I don't know why spammers don't do it...).
yeah. generally we've been able to detect bad stuff creeping in due to odd rules firing in mass-checks, so I'm not too worried about that. --j.