On 21 Feb 2006 15:00:56 -0600, Richard B Barger wrote: > I've been a very pleased Spambayes user for a couple of years. > Because we have a bunch of public business email addresses, I receive > a huge volume of email, mostly spam. > > I've been delighted with Spambayes, so I wanted to describe what my > local ISP, Skyway Networks, is doing that is like Spambayes On > Steroids (I was a beta tester):
This is a typical post-acceptance content analysis system. It is effective at keeping a lot of spam away from the user's mailbox, but it suffers from the same problems as most systems of it's type (see below). <...> > Here's a brief overview of the process it goes through: > > - Before accepting a message, the system checks if the email address > is valid. This protects against directory-harvesting attacks by > spammers. > - When the message is accepted, it is next checked for > worms/viruses, using three different anti-virus programs. Here is the basic problem with this approach that is common in this class of system. As long as the recipient envelope address is valid, the message is accepted for delivery and only _then_ processed to determine if it is spam. This is only one step beyond the old store-and-forward architectures in that it checks for a valid recipient before accepting. Since most incoming messages are spam today, the MTA is forced to silently discard most of what it accepts. This breaks most of the assumptions behind SMTP. Accepting a message for delivery means you accept the responsibility to do one of two things: deliver the message to the intended recipient or send a Delivery Status Notification (DSN or bounce) to the original sender so they know their mail was not delivered. Since spam usually has forged return-addresses, you can't send a DSN. Unless you know the return address is not a forgery, you shouldn't accept anything that you may not even attempt to deliver. Because no system can completely avoid false positives, the one thing you want to avoid is accepting mail for delivery and then silently discarding it. Unfortunately, under the duress of high spam loads, that is exactly what many older system designs do. The cost of the additional bandwidth and CPU usage has to be borne by the customers, so this approach is far from optimal. To avoid this, you do as many things as possible during the SMTP conversation, with an emphasis on rejecting messages at the envelope stage where you have expended a minimum of resources. This saves you bandwidth and avoids the high CPU load of content analysis tools like virus scanners, SpamAssassin, Pyzor and other techniques that you describe. For example, the IP-based DNSBL check should be done immediately upon request for the SMTP connection. Why even have a conversation with an MTA that is blacklisted? In the unusual event of a false positive, your sender knows immediately that their message was not delivered because they get a DSN, rather than assuming you received and ignored their message. Another reason for rejecting as much spam as possible rather than accepting and silently discarding it is that the spammers _know_ their message went undelivered. If a message is accepted, they know there is a minute chance that it will make it into a users inbox. That small probability is the basis of their business. The more MTA's that reject spam during SMTP, the worse their business model appears. They don't do this for fun, they need to make money. To do that, they have to get their messages accepted at recipient MTA's. A rejection says there is 0% chance the message will be seen by anyone. By employing a variety of rejection tools (i.e. DNSBL's for the connecting IP plus HELO name and rDNS heuristics), most of the load can be rejected during the envelope phase of SMTP. For the ones that make it past the envelope, it is still possible to do the remaining content checks during the DATA phase and make the sender wait before confirming acceptance with a 250 code. Many people argue that spammers often abuse pipelining and dump the whole message after the DATA command then disconnect, not waiting around for the acceptance. Any MTA behaving that way can be added to a local DNSBL so you don't talk to them next time. Similarly, there are a number of heuristics that can catch this type of spammer early: put in a delay after the connection request before you send the banner. Anyone who doesn't wait for the end of banner can be safely disconnected and blacklisted for the future. If you want to perform a public service, tarpit them instead of merely rejecting and blacklisting. That takes almost none of your resources and a lot of theirs, thereby reducing the amount of spam they can send out to others. A small number of well-placed tarpits can bring a large number of spamming MTA's to their knees and if they are trojaned Windows boxes, cause them to crash. Spambayes, like all other MUA solutions, is a tool of last resort. It happens to be among the best in its category, but it has to catch whatever spam your MTA fails to reject. The less spam it has to deal with, the less likely you are to ever see any of it. In addition, the less spam that your MTA accepts and silently rejects, the less the chance of silently discarding a wanted communication and the more spammers know their spew is not being delivered. It sounds like their implementation is well-done for its type, but it does not use best current practices. -- Seth Goodman _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
