Hello Andy, Thursday, December 28, 2006, 3:16:57 PM, you wrote:
<snip/> >>> need to ensure SNF causes no False Positives << > I agree here. While I can excuse the occasional "accidental" FP - there > should NOT be the mindset that customers just have to live with the fact > that the IP rules WILL always catch a certain amount of good emails, because > no effort has been made to exempt "known good" IP/RevDNS ranges. The bot does make this effort, though that can always be improved. Most IP FPs these days are for older rules that at the time they were created were valid and have shown consistent activity without FP reports over their lifetime. Those where activity has fallen off have been automatically removed. > I also think that the "low false positive" argument is built on unproven > assumptions. To me, researching and reporting a single false positives > takes a very significant amount of time. Bigger users may simply have no > practical way to reporting their false positives and instead just "cope" > with it by using weight-based systems to compensate. To be sure larger systems do tend to have large weight-based systems in place. None the less we do hear from them when false positives occur, and we also hear from smaller systems that are more focused on individual customers and domains. Where we get our FP data: We have a range of customers who reliably report false positives to us including a number of larger ISPs who consistently research and report their FPs in detail. We also have smaller service providers -- guys who "live in their system" who do the same thing-- so we get a fairly wide perspective. In addition to that we have links into a number of systems to provide us with rule IDs for messages that are released from quarantines, etc... In the new version of SNF we are adding an automated reputation system component called GBUdb (Good, Bad, Ugly / Unknown, Ignore / Infrastructure). This system will (among other things) learn the good IP sources for a given system and automatically override pattern matching rules that hit known good messages. The system will also report these conflicts to us and in extreme cases will be able to "auto-panic" bad pattern rules so that they not only have no effect on the local systems but are also automatically withdrawn from the core rulebase. (Rule panics are rare, but also destructive. The auto-panic mechanism should completely mitigate them if/when one slips trough.) All that by way of saying - we are constantly working to improve our access to good sources of FP data - even while reducing the system admin's workload. > The process of finding "clues" in the header, then finding the correct log > file and then matching log file lines in Sniffer, then creating an evidence > email, is just far too cumbersome. I should be able to forward any falsely > identified emails (with SMTP headers) as easily as I can submit "real spam" > for analysis. If that requires that Sniffer has to insert header > information with the "rule number" - so be it. My inclination is, if it's > currently 10 times harder to report false positives than it is to report > missed spam, then I suspect that the false positive rates could be 10 times > higher than what's actually being reported. In many cases this is true -- the cases tend to be platform specific. In MDaemon, for example, rule id information is injected into the headers so that FP reporting is a relatively painless process (no research required). The same is true on most *nix implementations. On IMail/SmarterMail type implementations it may be possible to add the ability to add headers to the message - but only at a significant I/O cost (rewriting the entire message with the new headers more than once). I should also note that in most cases our system is able to identify the rules that matched an FP submission without any additional research on the part of the submitting admin. Our FP system re-scans each submission with every known rule -- it is unfortunately also true that there are some systems that for a variety of reasons modify the message during the submission process so that the rules no longer match -- in those cases the research is required in order to move forward. The good news (if it can be called that) is that the need to do the research tends to be consistent--- if you are able to submit an FP without finding the matching log lines then you are likely to be able to do this consistently, and most folks do fall into this category. --- Along with the new engine I am considering some mechanisms that might be able to store rule matching data along with a message id hash on the local SNF node for a period of time. If research on this mechanism indicates that it would be useful and desirable then we may be able to add a feature that would allow an SNF node to provide the data upon request when an FP is submitted without having to modify the message in any way -- provided the FP is discovered and submitted within the storage window... This is all theoretical at the moment however and is likely to be bundled with message archiving and quarantine features that may obviate the need for such a thing... Thanks, _M -- Pete McNeil Chief Scientist, Arm Research Labs, LLC. ############################################################# This message is sent to you because you are subscribed to the mailing list <sniffer@sortmonster.com>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>