Pete,

CBL has a proven 99.97% accuracy and on some systems over a 40% hit rate on traffic, yet their methods are rather simple and easy to implement.

If an IP hits your spamtrap, and it has either no reverse DNS entry or it has a dynamic reverse DNS entry, it is added, if it doesn't, it isn't added. They have a few other mechanisms that I am aware of, but the above will take care of almost everything related to spam zombies. Your current whitelisting method will take care of the few exceptions to this. There is rather simple code that can test for standard types of dynamic reverse DNS entries with both numbers and hex encoded values, and exceptions for names that might include things like "mail" or "mx#" in the names.

If you want to expand this to static spammers, you merely introduce other pre-qualifications such as having a Mail From domain or HELO that matches the payload domain in the body. I figure that for the most part however that you are tagging static spammers with other rules that take presidence over the IP rules, and that this would be minimally beneficial in comparison to spam zombies.

The source of the false positives hitting your spam traps are most likely due to AFF (Advance Fee Fraud) and some phishing, which use free accounts on legitimate servers to send their spam, and an increasing precidence of hacked E-mail accounts being used by zombie spammers. The first method would avoid listing such servers in almost every circumstance, and we certainly wouldn't ever see things like yahoo.com, gmail.com and rr.com mail servers listed like we see with some degree of regularity under the current method.

Matt


Pete McNeil wrote:
Hello Andy,

Tuesday, April 3, 2007, 5:15:12 PM, you wrote:

Hi Jonathan:

That's exactly the problem. These particular rules were blocking Google mail
servers - NOT specific content.

To clarify, it was blocking precisely one IP. The F001 bot only tags a
single IP at a time (not ranges, ever), and only after repeated
appearances at clean spamtraps where the message also fails other
tests (often including content problems like bad headers, obfuscation,
heuristic scam text matching etc.)

The rule was in place from 20070326. The first reported false
positives arrived today (just after midnight). The rule was removed
just less than 12 hours from that report (due to scheduling and heavy
spam activity this morning that requiring my immediate attention). The
report was ordinary (not a rule panic).

As is the case with all FPs, the rule cannot be repeated (without
special actions).

Obviously, as already discussed in the past, it IS necessary that these
IP-based blocks are put under a higher scrutiny. I'm not suggesting that the
"automatic" bots should be disabled, I'm just proposing that "intelligence"
must be incorporated that will use RevDNS and WHOIS to identify POSSIBLY
undesirable blocks and to flag those for human review by Sniffer personnel
so that they don't end up poisoning mail severs of all their clients.

While interesting, these mechanism are not foolproof nor trivial to
implement. Also - our prior research has taught us that direct human
involvement in IP rule evaluation leads to far more errors we can
allow. Once upon a time, IP rules were created in very much the way
you describe -- candidate IPs were generated from spamtraps and the
live spam corpus and then reviewed (manually and automatically)
against RevDNS, WHOIS, and other tools. At that time, IP rules had the
absolute worst reliability of any test mechanism we provided. Upon
further R&D, we created the F001 bot that is in place and now the
error rate has been significantly reduced and our people are able to
focus on things that computers can't do better.

Please don't get me wrong, I'm definitely not saying that the F001 bot
can't be improved - it certainly can, and will if it survives. What I
am saying is that it is accurate enough now that any improvements in
accuracy would be non-trivial to implement.

Our current development focus is on developing the suite of
applications and tools that will allow us to complete the alpha and
beta testing of the next version of SNF*. That work has priority, and
given that these events are rare and easily mitigated we have not
deemed it necessary to make enhancements to the F001 bot a higher
priority.

The following factors make it relatively easy to mitigate these IP FP
events (however undesirable): Rule panics can make these rules
immediately inert, FP report/response times are sufficiently quick,
The IP rule group is sequestered at the lowest priority so that it can
easily be weighted lower than other tests.

Also, it is likely that the F001 bot and IP rules group will be
eliminated once the next SNF version is sufficiently deployed because
one of the major enhancements of the new engine is a multi-tier,
self-localizing IP reputation system (GBUdb).

* A production ready SYNC server is nearing completion. This software
will allow large numbers of GBUdb equipped nodes to share
near-real-time IP statistics. The new SNF engine itself is nearly
complete and has been in alpha testing in production environments for
some time to prove that it is stable (and it is). We expect to begin
wider alpha testing followed quickly by beta testing within the next
few weeks if all goes well. Once the system is deployed, all SNF nodes
will cooperate to learn both good and bad IP sources based on content
analysis and localized behaviors and they will be able to share what
they learn with other nodes within seconds (90 on average) of any
significant change or new knowledge.

I understand that occasionally some innocent IP can be added accidentally
and there is little to avoid that -- but for the top 50 email domains, extra
security/intelligence should be in place so that we don't suddenly reject
huge volumes of legitimate mail by blocking hotmail, aol, yahoo, google or
similar providers! These kind of errors can be caught much earlier.

Perhaps; and we do have mechanisms in place to help prevent these
events. For example, there is one mechanism were IPs that appear to be
at risk are entered into the IP rule group as "nokens" (Excluded on
entry) to prevent manual or automatic processes from creating black
rules.

As you point out, though, occasionally any system will allow errors
from time to time.

_M

Reply via email to