On Sat, 25 Dec 2004 13:10:04 -0900, John Andersen <[EMAIL PROTECTED]> wrote:
> From your web page:
> 
> "Bodytest" support - allows you to run filters like spamassassin and clamscan
> on the body of a mail message before replying to the final "." of the SMTP
> DATA command. (See the edinplace(1) man page and the bodytest description in
> the avenger(1) man page.)
> 
> This would imply that you hold the connection open from the sender till
> SA has had a look at the mail, (which may entail several network based hits
> in the process of checking surbl etc).  Does this not entail some rather
> large number of open connections on the mail server, some of which might
> time out when SA is working hard?

Yes, it does mean that there is a potential delay here.  I think the
biggest danger is that if you get unlucky, you could get a duplicate
mail message, if the client timed out but the mail ended up going
through.  However, in practice I'm running the software on several
production mail servers, one of which has hundreds of users (the
others are smaller), and I have not noticed this problem.  Usually
clients have a timeout of at least several minutes during the DATA
portion of the SMTP session, while spamassassin seems to take only
seconds or tens of seconds in the worst case.

Note that external network queries are fairly common during SMTP
transactions.  For example, almost all MTAs do reverse DNS lookups and
RFC 1413 ident lookups (the latter of which can be very slow for
clients behind firewalls that block TCP port 113).  Nowadays, servers
including Mail Avenger increasingly support SPF, which requires more
DNS lookups.  Mail Avenger also does its own RBL lookups if you ask it
to, which will prime your nameserver's cache before invoking
spamassassin.  (Mail Avenger does the lookups concurrently and before
the DATA command, so the latency is less and there is no chance of a
duplicate message.)  This doesn't apply to SURBL lookups, of course.

Further mitigating the problem, you can configure Mail Avenger both to
limit the number of concurrent connections and to limit the number of
connections from any given IP address.  Some MTA clients like certain
versions of qmail have a habit of opening 20 TCP connections to the
same mail server concurrently.  Mail Avenger can, if you so configure
it, accept 5 connections from a client, then tweak the kernel's
firewall rules to drop further SYN packets from that particular client
until one of the 5 existing connections closes.  Thus, instead of
having 20 connections stuck waiting for spamassassin on an overloaded
server, you'll have most of the connections waiting for the TCP
connection to complete (for which you usually have about 2 minutes),
and again no risk of duplicate messages.

So all in all I'd say you've identified a potential concern, and it is
something I worried about initially, but in practice it really doesn't
seem to be a problem.

> Also does avenger sit ahead of sendmail or is it called by sendmail?
> (Who is listening on 25? Avenger or sendmail/qumail?

Yes, Mail Avenger listens on port 25.  It needs to in order to
coordinate client connections with firewall rules, as well as to do
things like infer the client's operating system from its TCP SYN
fingerprints and determine the network route to the client while it is
connected.  We've seen some evidence that a few spam sources are
correlated with BGP route flaps, meaning that some spammers may be
temporarily stealing IP address space to send their spam.  Thus, it's
important to record the network path at the time of the TCP
connection.

Mail Avenger passes mail messages off to an arbitrary program you can
configure.  The default is "sendmail -oi -os -oee -f SENDER --
RECIPIENT1 RECIPIENT2 ...", which works with both sendmail and qmail. 
I would imagine something similar should also work with postfix, exim,
and other mailers, though I haven't yet tried it myself.

David

Reply via email to