On Sat, 25 Dec 2004 13:10:04 -0900, John Andersen <[EMAIL PROTECTED]> wrote: > From your web page: > > "Bodytest" support - allows you to run filters like spamassassin and clamscan > on the body of a mail message before replying to the final "." of the SMTP > DATA command. (See the edinplace(1) man page and the bodytest description in > the avenger(1) man page.) > > This would imply that you hold the connection open from the sender till > SA has had a look at the mail, (which may entail several network based hits > in the process of checking surbl etc). Does this not entail some rather > large number of open connections on the mail server, some of which might > time out when SA is working hard?
Yes, it does mean that there is a potential delay here. I think the biggest danger is that if you get unlucky, you could get a duplicate mail message, if the client timed out but the mail ended up going through. However, in practice I'm running the software on several production mail servers, one of which has hundreds of users (the others are smaller), and I have not noticed this problem. Usually clients have a timeout of at least several minutes during the DATA portion of the SMTP session, while spamassassin seems to take only seconds or tens of seconds in the worst case. Note that external network queries are fairly common during SMTP transactions. For example, almost all MTAs do reverse DNS lookups and RFC 1413 ident lookups (the latter of which can be very slow for clients behind firewalls that block TCP port 113). Nowadays, servers including Mail Avenger increasingly support SPF, which requires more DNS lookups. Mail Avenger also does its own RBL lookups if you ask it to, which will prime your nameserver's cache before invoking spamassassin. (Mail Avenger does the lookups concurrently and before the DATA command, so the latency is less and there is no chance of a duplicate message.) This doesn't apply to SURBL lookups, of course. Further mitigating the problem, you can configure Mail Avenger both to limit the number of concurrent connections and to limit the number of connections from any given IP address. Some MTA clients like certain versions of qmail have a habit of opening 20 TCP connections to the same mail server concurrently. Mail Avenger can, if you so configure it, accept 5 connections from a client, then tweak the kernel's firewall rules to drop further SYN packets from that particular client until one of the 5 existing connections closes. Thus, instead of having 20 connections stuck waiting for spamassassin on an overloaded server, you'll have most of the connections waiting for the TCP connection to complete (for which you usually have about 2 minutes), and again no risk of duplicate messages. So all in all I'd say you've identified a potential concern, and it is something I worried about initially, but in practice it really doesn't seem to be a problem. > Also does avenger sit ahead of sendmail or is it called by sendmail? > (Who is listening on 25? Avenger or sendmail/qumail? Yes, Mail Avenger listens on port 25. It needs to in order to coordinate client connections with firewall rules, as well as to do things like infer the client's operating system from its TCP SYN fingerprints and determine the network route to the client while it is connected. We've seen some evidence that a few spam sources are correlated with BGP route flaps, meaning that some spammers may be temporarily stealing IP address space to send their spam. Thus, it's important to record the network path at the time of the TCP connection. Mail Avenger passes mail messages off to an arbitrary program you can configure. The default is "sendmail -oi -os -oee -f SENDER -- RECIPIENT1 RECIPIENT2 ...", which works with both sendmail and qmail. I would imagine something similar should also work with postfix, exim, and other mailers, though I haven't yet tried it myself. David