[sniffer] Re: How to incorporate a white list?

Andy Schmidt Tue, 03 Apr 2007 21:38:40 -0700

Hi Pete,

Thanks for taking the time to respond.

>> The rule was in place from 20070326. The first reported false positives
arrived today <<

Except that reports from end users lingered in my email since Friday. Not
your fault - but just to better demonstrate the ultimate effect it had.

To be certain, I wasn't dissatisfied with your reaction time after I finally
got around to looking at the user reports and compiling reports to you. My
argument is, that for big email providers, there could be procedures in
place to identify possible bad rules and flag them for review without
waiting for FP reports.

>> To clarify, it was blocking precisely one IP. The F001 bot only tags a
single IP at a time (not ranges, ever) <<

Except that there were multiple rules (I remember seeing at least two) - and
each one (if I recall correctly) targeting a different IP in the same block.

Thus, the difference is merely technical (whether n rules are needed for n
IPs or whether one rule covers multiple).

>> Once upon a time, IP rules were created in very much the way you describe
-- candidate IPs were generated from spamtraps and the
live spam corpus and then reviewed (manually and automatically) against
RevDNS, WHOIS, and other tools. At that time, IP rules had the
absolute worst reliability of any test mechanism we provided. <<

I can't follow the logic. If F001 would continue to be used (but certain IPs
are reviewed), then this can't possibly increase the false positive rate. At
worst, a rule may be prohibited unnecessarily...  But that's our job - to
err on the "save" side and let the GOOD mail go through. If we block good
mail, then the system has failed the user.

Best Regards,
Andy

-----Original Message-----
From: Message Sniffer Community [mailto:[EMAIL PROTECTED] On Behalf
Of Pete McNeil
Sent: Tuesday, April 03, 2007 6:31 PM
To: Message Sniffer Community
Subject: [sniffer] Re: How to incorporate a white list?

Hello Andy,

Tuesday, April 3, 2007, 5:15:12 PM, you wrote:

> Hi Jonathan:

> That's exactly the problem. These particular rules were blocking Google
mail
> servers - NOT specific content.

To clarify, it was blocking precisely one IP. The F001 bot only tags a
single IP at a time (not ranges, ever), and only after repeated
appearances at clean spamtraps where the message also fails other
tests (often including content problems like bad headers, obfuscation,
heuristic scam text matching etc.)

The rule was in place from 20070326. The first reported false
positives arrived today (just after midnight). The rule was removed
just less than 12 hours from that report (due to scheduling and heavy
spam activity this morning that requiring my immediate attention). The
report was ordinary (not a rule panic).

As is the case with all FPs, the rule cannot be repeated (without
special actions).

> Obviously, as already discussed in the past, it IS necessary that these
> IP-based blocks are put under a higher scrutiny. I'm not suggesting that
the
> "automatic" bots should be disabled, I'm just proposing that
"intelligence"
> must be incorporated that will use RevDNS and WHOIS to identify POSSIBLY
> undesirable blocks and to flag those for human review by Sniffer personnel
> so that they don't end up poisoning mail severs of all their clients.

While interesting, these mechanism are not foolproof nor trivial to
implement. Also - our prior research has taught us that direct human
involvement in IP rule evaluation leads to far more errors we can
allow. Once upon a time, IP rules were created in very much the way
you describe -- candidate IPs were generated from spamtraps and the
live spam corpus and then reviewed (manually and automatically)
against RevDNS, WHOIS, and other tools. At that time, IP rules had the
absolute worst reliability of any test mechanism we provided. Upon
further R&D, we created the F001 bot that is in place and now the
error rate has been significantly reduced and our people are able to
focus on things that computers can't do better.

Please don't get me wrong, I'm definitely not saying that the F001 bot
can't be improved - it certainly can, and will if it survives. What I
am saying is that it is accurate enough now that any improvements in
accuracy would be non-trivial to implement.

Our current development focus is on developing the suite of
applications and tools that will allow us to complete the alpha and
beta testing of the next version of SNF*. That work has priority, and
given that these events are rare and easily mitigated we have not
deemed it necessary to make enhancements to the F001 bot a higher
priority.

The following factors make it relatively easy to mitigate these IP FP
events (however undesirable): Rule panics can make these rules
immediately inert, FP report/response times are sufficiently quick,
The IP rule group is sequestered at the lowest priority so that it can
easily be weighted lower than other tests.

Also, it is likely that the F001 bot and IP rules group will be
eliminated once the next SNF version is sufficiently deployed because
one of the major enhancements of the new engine is a multi-tier,
self-localizing IP reputation system (GBUdb).

* A production ready SYNC server is nearing completion. This software
will allow large numbers of GBUdb equipped nodes to share
near-real-time IP statistics. The new SNF engine itself is nearly
complete and has been in alpha testing in production environments for
some time to prove that it is stable (and it is). We expect to begin
wider alpha testing followed quickly by beta testing within the next
few weeks if all goes well. Once the system is deployed, all SNF nodes
will cooperate to learn both good and bad IP sources based on content
analysis and localized behaviors and they will be able to share what
they learn with other nodes within seconds (90 on average) of any
significant change or new knowledge.

> I understand that occasionally some innocent IP can be added accidentally
> and there is little to avoid that -- but for the top 50 email domains,
extra
> security/intelligence should be in place so that we don't suddenly reject
> huge volumes of legitimate mail by blocking hotmail, aol, yahoo, google or
> similar providers! These kind of errors can be caught much earlier.

Perhaps; and we do have mechanisms in place to help prevent these
events. For example, there is one mechanism were IPs that appear to be
at risk are entered into the IP rule group as "nokens" (Excluded on
entry) to prevent manual or automatic processes from creating black
rules.

As you point out, though, occasionally any system will allow errors
from time to time.

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <[email protected]>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <[email protected]>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

[sniffer] Re: How to incorporate a white list?

Reply via email to