[vchkpw] Re: Spam Assassin implementation

Eric Ziegast Thu, 20 Mar 2003 09:18:02 -0800

News from the front lines:

  In a world that has mostly benign spam where spammers with real
  return addresses send messages to valid recipients, qmail-scanner
  has its place.  You can easily tag spam qith qmail-scanner so that
  the POP/IMAP clients can deal with the messsages appropriately.


For a small site with a few users ("few" < 1000), using
.qmail-(USER|default) or user-based implementation rules is fine.
For an ISP with thousands of users, it's not good enough anymore.
Even qmail-scanner-queue doesn't help protect servers from the
constant deluge of malignant messages.

I've been finding that at a small ISP (20k users), the final delivery
is far too late in the process to deal with spam.  Address harvesters
(sending to 99% invalid addresses to find the 1% that don't bounce) and
spam blasters (sending spam to >3 invalid recipients per message) tax
the server processing hard enough to cause problems, particularly from
bounce addresses to forged senders.  As the spammers get more
persistent or desperate, they've been less gracious about how they
spam.  In one case recently, I had a DDoS from 40 sites sending similar
spam all at once to/through our server to thousands of bad addresses.
Our servers spent a whole weekend trying to deliver the bounce messages
until I could clean/drain the queues of 85000 bounce messages.  There's
not much that qmail-scanner can do itself to protect the server.

I am using two tools for the benefit of my users:
  SpamAssassin (www.spamassassin.org)
  Vexira virus scanner (www.centralcommand.com)

If the spammers weren't too peristent, I'd be able to just use
qmail-scanner-queue.pl and be mostly done.  This worked for a
couple months before our ISP became a heavily hit target (60%
spam, 25% malignant spam).

My implementation now includes:

  a qmail-smtpd that rejects mail based on environment variables
  set from tcp.smtp.

  a qmail.c hacked to provde better SMTP error codes based on qq exit
  codes.

  a rewitten qmail-scanner-queue that is highly optimized at letting
  SA/spamc and Vexira do their job with minimal system resources

  a qmail-send that injects bounce messages to the sender only when 
  it's a non-malignant message (one-to-one communication to a valid
  recipient)

  a qmail-send that puts messages into a holding queue rather
  than fully processing them right away.  An asynchronous program
  comes by and processes each message in the holding queue linearly
  to prevent load swings from simultaneous qmail-send/vdelivermail
  instances.
  
  a procmail-like perl program responsible for final delivery that
  queries a mysql database for a user's spam preference and uses
  those preferences to tag/delete/pass messages based on SA scores
  and user-defined keywords.  A coworker made a web user interface.

  added functionality to auto-add and auto-remove statistically
  defined address harvesters and spam blasters to my tcp.smtp
  block lists (with appropriate 400 or 500 messages based on
  severity)

  a program to create a cdb database of valid users to help the
  filtering programs detect how many valid vs invalid users an IP
  address or netblock is attempting to send to.

This my the third round of an on-and-off 6-month long fight.  It's
not about filtering spam anymore, it's about protecting our mail
servers.  As I leave, I have a big "I told you so" about how our CEO
should have just subscribed to BigFish/Frontbridge and paying the extra
money instead of going it alone.  It would have saved money and reduced
downtime if SPAM processing weren't our problem.

The system is complex (some new perl/SspeedyCGI programs plus several
patches to qmail and one to vdelivermail), but it actively provides
negative feedback to spammers and harvesters with (hopefully) little
to no administration from a mail administrator.  The good news is
that I'm about to finish up, and I don't have any IP restricions with
the ISP, so I believe I'll be able to share most of my work.  I hope
to be posting some patches and programs soon.

Another approach could have been to just integrate everything into
SpamAssassin, but it's getting too huge already.  Each of thousands
of 4K messages doen't need to go through a program that sucks 16MB
RSS memory.  A large program isn't the most efficient place to
block/route messages.


--
Eric Ziegast
internet!vix.com!ziegast
Winning another battle in the losing war against spam.

[vchkpw] Re: Spam Assassin implementation

Reply via email to