News from the front lines:
In a world that has mostly benign spam where spammers with real
return addresses send messages to valid recipients, qmail-scanner
has its place. You can easily tag spam qith qmail-scanner so that
the POP/IMAP clients can deal with the messsages appropriately.
For a small site with a few users ("few" < 1000), using
.qmail-(USER|default) or user-based implementation rules is fine.
For an ISP with thousands of users, it's not good enough anymore.
Even qmail-scanner-queue doesn't help protect servers from the
constant deluge of malignant messages.
I've been finding that at a small ISP (20k users), the final delivery
is far too late in the process to deal with spam. Address harvesters
(sending to 99% invalid addresses to find the 1% that don't bounce) and
spam blasters (sending spam to >3 invalid recipients per message) tax
the server processing hard enough to cause problems, particularly from
bounce addresses to forged senders. As the spammers get more
persistent or desperate, they've been less gracious about how they
spam. In one case recently, I had a DDoS from 40 sites sending similar
spam all at once to/through our server to thousands of bad addresses.
Our servers spent a whole weekend trying to deliver the bounce messages
until I could clean/drain the queues of 85000 bounce messages. There's
not much that qmail-scanner can do itself to protect the server.
I am using two tools for the benefit of my users:
Vexira virus scanner (www.centralcommand.com)
If the spammers weren't too peristent, I'd be able to just use
qmail-scanner-queue.pl and be mostly done. This worked for a
couple months before our ISP became a heavily hit target (60%
spam, 25% malignant spam).
My implementation now includes:
a qmail-smtpd that rejects mail based on environment variables
set from tcp.smtp.
a qmail.c hacked to provde better SMTP error codes based on qq exit
a rewitten qmail-scanner-queue that is highly optimized at letting
SA/spamc and Vexira do their job with minimal system resources
a qmail-send that injects bounce messages to the sender only when
it's a non-malignant message (one-to-one communication to a valid
a qmail-send that puts messages into a holding queue rather
than fully processing them right away. An asynchronous program
comes by and processes each message in the holding queue linearly
to prevent load swings from simultaneous qmail-send/vdelivermail
a procmail-like perl program responsible for final delivery that
queries a mysql database for a user's spam preference and uses
those preferences to tag/delete/pass messages based on SA scores
and user-defined keywords. A coworker made a web user interface.
added functionality to auto-add and auto-remove statistically
defined address harvesters and spam blasters to my tcp.smtp
block lists (with appropriate 400 or 500 messages based on
a program to create a cdb database of valid users to help the
filtering programs detect how many valid vs invalid users an IP
address or netblock is attempting to send to.
This my the third round of an on-and-off 6-month long fight. It's
not about filtering spam anymore, it's about protecting our mail
servers. As I leave, I have a big "I told you so" about how our CEO
should have just subscribed to BigFish/Frontbridge and paying the extra
money instead of going it alone. It would have saved money and reduced
downtime if SPAM processing weren't our problem.
The system is complex (some new perl/SspeedyCGI programs plus several
patches to qmail and one to vdelivermail), but it actively provides
negative feedback to spammers and harvesters with (hopefully) little
to no administration from a mail administrator. The good news is
that I'm about to finish up, and I don't have any IP restricions with
the ISP, so I believe I'll be able to share most of my work. I hope
to be posting some patches and programs soon.
Another approach could have been to just integrate everything into
SpamAssassin, but it's getting too huge already. Each of thousands
of 4K messages doen't need to go through a program that sucks 16MB
RSS memory. A large program isn't the most efficient place to
Winning another battle in the losing war against spam.