Hello Andy,

Thursday, December 28, 2006, 3:16:57 PM, you wrote:

<snip/>

>>> need to ensure SNF causes no False Positives <<

> I agree here. While I can excuse the occasional "accidental" FP - there
> should NOT be the mindset that customers just have to live with the fact
> that the IP rules WILL always catch a certain amount of good emails, because
> no effort has been made to exempt "known good" IP/RevDNS ranges.

The bot does make this effort, though that can always be improved.
Most IP FPs these days are for older rules that at the time they were
created were valid and have shown consistent activity without FP
reports over their lifetime. Those where activity has fallen off have
been automatically removed.

> I also think that the "low false positive" argument is built on unproven
> assumptions.  To me, researching and reporting a single false positives
> takes a very significant amount of time.  Bigger users may simply have no
> practical way to reporting their false positives and instead just "cope"
> with it by using weight-based systems to compensate.

To be sure larger systems do tend to have large weight-based systems
in place. None the less we do hear from them when false positives
occur, and we also hear from smaller systems that are more focused on
individual customers and domains.

Where we get our FP data:

We have a range of customers who reliably report false positives to us
including a number of larger ISPs who consistently research and report
their FPs in detail. We also have smaller service providers -- guys
who "live in their system" who do the same thing-- so we get a fairly
wide perspective. In addition to that we have links into a number of
systems to provide us with rule IDs for messages that are released
from quarantines, etc...

In the new version of SNF we are adding an automated reputation system
component called GBUdb (Good, Bad, Ugly / Unknown, Ignore /
Infrastructure). This system will (among other things) learn the good
IP sources for a given system and automatically override pattern
matching rules that hit known good messages. The system will also
report these conflicts to us and in extreme cases will be able to
"auto-panic" bad pattern rules so that they not only have no effect on
the local systems but are also automatically withdrawn from the core
rulebase.

(Rule panics are rare, but also destructive. The auto-panic mechanism
should completely mitigate them if/when one slips trough.)

All that by way of saying - we are constantly working to improve our
access to good sources of FP data - even while reducing the system
admin's workload.

> The process of finding "clues" in the header, then finding the correct log
> file and then matching log file lines in Sniffer, then creating an evidence
> email, is just far too cumbersome.  I should be able to forward any falsely
> identified emails (with SMTP headers) as easily as I can submit "real spam"
> for analysis.  If that requires that Sniffer has to insert header
> information with the "rule number" - so be it. My inclination is, if it's
> currently 10 times harder to report false positives than it is to report
> missed spam, then I suspect that the false positive rates could be 10 times
> higher than what's actually being reported.

In many cases this is true -- the cases tend to be platform specific.
In MDaemon, for example, rule id information is injected into the
headers so that FP reporting is a relatively painless process (no
research required). The same is true on most *nix implementations.

On IMail/SmarterMail type implementations it may be possible to add
the ability to add headers to the message - but only at a significant
I/O cost (rewriting the entire message with the new headers more than
once).

I should also note that in most cases our system is able to identify
the rules that matched an FP submission without any additional
research on the part of the submitting admin. Our FP system re-scans
each submission with every known rule -- it is unfortunately also true
that there are some systems that for a variety of reasons modify the
message during the submission process so that the rules no longer
match -- in those cases the research is required in order to move
forward. The good news (if it can be called that) is that the need to
do the research tends to be consistent--- if you are able to submit an
FP without finding the matching log lines then you are likely to be
able to do this consistently, and most folks do fall into this
category.

---

Along with the new engine I am considering some mechanisms that might
be able to store rule matching data along with a message id hash on
the local SNF node for a period of time. If research on this mechanism
indicates that it would be useful and desirable then we may be able to
add a feature that would allow an SNF node to provide the data upon
request when an FP is submitted without having to modify the message
in any way -- provided the FP is discovered and submitted within the
storage window... This is all theoretical at the moment however and is
likely to be bundled with message archiving and quarantine features
that may obviate the need for such a thing...

Thanks,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Reply via email to