Hi Pete, Thanks for taking the time to respond.
>> The rule was in place from 20070326. The first reported false positives arrived today << Except that reports from end users lingered in my email since Friday. Not your fault - but just to better demonstrate the ultimate effect it had. To be certain, I wasn't dissatisfied with your reaction time after I finally got around to looking at the user reports and compiling reports to you. My argument is, that for big email providers, there could be procedures in place to identify possible bad rules and flag them for review without waiting for FP reports. >> To clarify, it was blocking precisely one IP. The F001 bot only tags a single IP at a time (not ranges, ever) << Except that there were multiple rules (I remember seeing at least two) - and each one (if I recall correctly) targeting a different IP in the same block. Thus, the difference is merely technical (whether n rules are needed for n IPs or whether one rule covers multiple). >> Once upon a time, IP rules were created in very much the way you describe -- candidate IPs were generated from spamtraps and the live spam corpus and then reviewed (manually and automatically) against RevDNS, WHOIS, and other tools. At that time, IP rules had the absolute worst reliability of any test mechanism we provided. << I can't follow the logic. If F001 would continue to be used (but certain IPs are reviewed), then this can't possibly increase the false positive rate. At worst, a rule may be prohibited unnecessarily... But that's our job - to err on the "save" side and let the GOOD mail go through. If we block good mail, then the system has failed the user. Best Regards, Andy -----Original Message----- From: Message Sniffer Community [mailto:[EMAIL PROTECTED] On Behalf Of Pete McNeil Sent: Tuesday, April 03, 2007 6:31 PM To: Message Sniffer Community Subject: [sniffer] Re: How to incorporate a white list? Hello Andy, Tuesday, April 3, 2007, 5:15:12 PM, you wrote: > Hi Jonathan: > That's exactly the problem. These particular rules were blocking Google mail > servers - NOT specific content. To clarify, it was blocking precisely one IP. The F001 bot only tags a single IP at a time (not ranges, ever), and only after repeated appearances at clean spamtraps where the message also fails other tests (often including content problems like bad headers, obfuscation, heuristic scam text matching etc.) The rule was in place from 20070326. The first reported false positives arrived today (just after midnight). The rule was removed just less than 12 hours from that report (due to scheduling and heavy spam activity this morning that requiring my immediate attention). The report was ordinary (not a rule panic). As is the case with all FPs, the rule cannot be repeated (without special actions). > Obviously, as already discussed in the past, it IS necessary that these > IP-based blocks are put under a higher scrutiny. I'm not suggesting that the > "automatic" bots should be disabled, I'm just proposing that "intelligence" > must be incorporated that will use RevDNS and WHOIS to identify POSSIBLY > undesirable blocks and to flag those for human review by Sniffer personnel > so that they don't end up poisoning mail severs of all their clients. While interesting, these mechanism are not foolproof nor trivial to implement. Also - our prior research has taught us that direct human involvement in IP rule evaluation leads to far more errors we can allow. Once upon a time, IP rules were created in very much the way you describe -- candidate IPs were generated from spamtraps and the live spam corpus and then reviewed (manually and automatically) against RevDNS, WHOIS, and other tools. At that time, IP rules had the absolute worst reliability of any test mechanism we provided. Upon further R&D, we created the F001 bot that is in place and now the error rate has been significantly reduced and our people are able to focus on things that computers can't do better. Please don't get me wrong, I'm definitely not saying that the F001 bot can't be improved - it certainly can, and will if it survives. What I am saying is that it is accurate enough now that any improvements in accuracy would be non-trivial to implement. Our current development focus is on developing the suite of applications and tools that will allow us to complete the alpha and beta testing of the next version of SNF*. That work has priority, and given that these events are rare and easily mitigated we have not deemed it necessary to make enhancements to the F001 bot a higher priority. The following factors make it relatively easy to mitigate these IP FP events (however undesirable): Rule panics can make these rules immediately inert, FP report/response times are sufficiently quick, The IP rule group is sequestered at the lowest priority so that it can easily be weighted lower than other tests. Also, it is likely that the F001 bot and IP rules group will be eliminated once the next SNF version is sufficiently deployed because one of the major enhancements of the new engine is a multi-tier, self-localizing IP reputation system (GBUdb). * A production ready SYNC server is nearing completion. This software will allow large numbers of GBUdb equipped nodes to share near-real-time IP statistics. The new SNF engine itself is nearly complete and has been in alpha testing in production environments for some time to prove that it is stable (and it is). We expect to begin wider alpha testing followed quickly by beta testing within the next few weeks if all goes well. Once the system is deployed, all SNF nodes will cooperate to learn both good and bad IP sources based on content analysis and localized behaviors and they will be able to share what they learn with other nodes within seconds (90 on average) of any significant change or new knowledge. > I understand that occasionally some innocent IP can be added accidentally > and there is little to avoid that -- but for the top 50 email domains, extra > security/intelligence should be in place so that we don't suddenly reject > huge volumes of legitimate mail by blocking hotmail, aol, yahoo, google or > similar providers! These kind of errors can be caught much earlier. Perhaps; and we do have mechanisms in place to help prevent these events. For example, there is one mechanism were IPs that appear to be at risk are entered into the IP rule group as "nokens" (Excluded on entry) to prevent manual or automatic processes from creating black rules. As you point out, though, occasionally any system will allow errors from time to time. _M -- Pete McNeil Chief Scientist, Arm Research Labs, LLC. ############################################################# This message is sent to you because you are subscribed to the mailing list <[email protected]>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]> ############################################################# This message is sent to you because you are subscribed to the mailing list <[email protected]>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
