Hello Andy,

Tuesday, April 3, 2007, 5:15:12 PM, you wrote:

> Hi Jonathan:

> That's exactly the problem. These particular rules were blocking Google mail
> servers - NOT specific content.

To clarify, it was blocking precisely one IP. The F001 bot only tags a
single IP at a time (not ranges, ever), and only after repeated
appearances at clean spamtraps where the message also fails other
tests (often including content problems like bad headers, obfuscation,
heuristic scam text matching etc.)

The rule was in place from 20070326. The first reported false
positives arrived today (just after midnight). The rule was removed
just less than 12 hours from that report (due to scheduling and heavy
spam activity this morning that requiring my immediate attention). The
report was ordinary (not a rule panic).

As is the case with all FPs, the rule cannot be repeated (without
special actions).

> Obviously, as already discussed in the past, it IS necessary that these
> IP-based blocks are put under a higher scrutiny. I'm not suggesting that the
> "automatic" bots should be disabled, I'm just proposing that "intelligence"
> must be incorporated that will use RevDNS and WHOIS to identify POSSIBLY
> undesirable blocks and to flag those for human review by Sniffer personnel
> so that they don't end up poisoning mail severs of all their clients.

While interesting, these mechanism are not foolproof nor trivial to
implement. Also - our prior research has taught us that direct human
involvement in IP rule evaluation leads to far more errors we can
allow. Once upon a time, IP rules were created in very much the way
you describe -- candidate IPs were generated from spamtraps and the
live spam corpus and then reviewed (manually and automatically)
against RevDNS, WHOIS, and other tools. At that time, IP rules had the
absolute worst reliability of any test mechanism we provided. Upon
further R&D, we created the F001 bot that is in place and now the
error rate has been significantly reduced and our people are able to
focus on things that computers can't do better.

Please don't get me wrong, I'm definitely not saying that the F001 bot
can't be improved - it certainly can, and will if it survives. What I
am saying is that it is accurate enough now that any improvements in
accuracy would be non-trivial to implement.

Our current development focus is on developing the suite of
applications and tools that will allow us to complete the alpha and
beta testing of the next version of SNF*. That work has priority, and
given that these events are rare and easily mitigated we have not
deemed it necessary to make enhancements to the F001 bot a higher
priority.

The following factors make it relatively easy to mitigate these IP FP
events (however undesirable): Rule panics can make these rules
immediately inert, FP report/response times are sufficiently quick,
The IP rule group is sequestered at the lowest priority so that it can
easily be weighted lower than other tests.

Also, it is likely that the F001 bot and IP rules group will be
eliminated once the next SNF version is sufficiently deployed because
one of the major enhancements of the new engine is a multi-tier,
self-localizing IP reputation system (GBUdb).

* A production ready SYNC server is nearing completion. This software
will allow large numbers of GBUdb equipped nodes to share
near-real-time IP statistics. The new SNF engine itself is nearly
complete and has been in alpha testing in production environments for
some time to prove that it is stable (and it is). We expect to begin
wider alpha testing followed quickly by beta testing within the next
few weeks if all goes well. Once the system is deployed, all SNF nodes
will cooperate to learn both good and bad IP sources based on content
analysis and localized behaviors and they will be able to share what
they learn with other nodes within seconds (90 on average) of any
significant change or new knowledge.

> I understand that occasionally some innocent IP can be added accidentally
> and there is little to avoid that -- but for the top 50 email domains, extra
> security/intelligence should be in place so that we don't suddenly reject
> huge volumes of legitimate mail by blocking hotmail, aol, yahoo, google or
> similar providers! These kind of errors can be caught much earlier.

Perhaps; and we do have mechanisms in place to help prevent these
events. For example, there is one mechanism were IPs that appear to be
at risk are entered into the IP rule group as "nokens" (Excluded on
entry) to prevent manual or automatic processes from creating black
rules.

As you point out, though, occasionally any system will allow errors
from time to time.

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <[email protected]>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Reply via email to