[email protected] wrote: > I use Postfix 2.2.10 and I wonder what common practice is to stop backscatter email? I see that body checks and header checks are recommended. > > I turned off grey listing and have since been looking to improve my spam prevention without it. So far, I have studied the sorbs black lists and I've added a few more. I was originally using just > dnsbl.sorbs.net where I have added the spam list and all the rhsbl lists. I am also using just bl.spamcop.net and I have been trying to produce my own dns black list. I figure if I add more black > lists I will stop more spam from getting onto my mail servers in the first place. The trick is figuring out which ones to add. >
I had a substantial backscatter problem last Spring. I ended up writing a bunch of custom python code to filter it. The algorithm went something like this: 1. Determine if the incoming messages is some sort of bounce (possibly legit). This is determined by two things: a. The "From:" address (typically postmaster or mailer-daemon). b. Some or all of an original message is included either in-line or as a MIME attachment. If it is a bounce then: 2. If the original message is included in its entirety, then run the original through your regular spam filters. If it turns out to be spam then the bounce is backscatter. (This next step requires that have a name for each email address on your system, i.e. that [email protected] is "Joe Smith". Where you don't have that information you'll have to skip this step.) 3. Look for the "To:" line in the header of the original message. Parse out the email address and the name. Check to see of the name is a possible match for the email address. Typically in bounced spam you'll see a complete mismatch, i.e.: From: "Wendy" <[email protected]> There's no way "Wendy" could be a match for "Joe Smith", so the bounce is likely backscatter. I used a fuzzy string comparison so that slight variations and typos of "Joe Smith" would count as a match (a legit bounce). Python's difflib.get_close_matches() works well for this. This algorithm ended up catching well over 90% of the backscatter. I eventually reversed steps 2 and 3 to reduce CPU load. Step 3 is fairly cheap CPU-wise and very effective, so if it detects backscatter you can skip CPU-intensive step #2. Terry _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
