[sniffer] Re: Rules for Large International ISPs

2006-12-28 Thread Pete McNeil
Hello Andy,

Thursday, December 28, 2006, 10:34:15 AM, you wrote:

 Hi,

 This morning I had to file to false positive reports because emails from
 Wanadoo.FR and UOL.COM.BR were triggering SNIFFER-IP.

 I don't know if this is a coincidence or if this is a worrisome new trend

snip/

Our IP rule coding policies have not changed in quite some time and
the false positive rates for IP rules have dropped significantly since
the last change.

IP rules are now coded only by a specialized bot which has very strict
rules and looks only at clean spamtraps for recurring abuse.

 20061228150347  16  0   Match   799799  63  1   48  75
 20061228150347  16  0   Final   799799  63  0   174475

The above rule had been in place for 346 days without any false
positive reports. The rule was coded by the previous robot and at the
time was verified by 3 additional blacklists.

 20061228110558  15  16  Match   1235160 63  1   46  73
 20061228110558  15  16  Final   1235160 63  0   298073

This was coded by the new bot (F001) approximately 28 days ago - no
prior false positives.

IP rules are currently coded by the F001 bot based on direct, repeated
observations at clean spamtraps. IP rules are excluded on the first
false positive report so that they cannot be reactivated without
direct human intervention.

It is not practical for us to keep tabs on, nor deeply research every
possible IP that may be used by any large (or otherwise) ISP. Instead
we have the above policy and very strict observational rules to
prevent the addition of IPs that are likely to produce significant
legitimate traffic and to quickly and permanently remove IPs that
cause false positives. (some exceptions, of course, apply).

It is inevitable that there will be a nonzero error rate - but that
error rate is demonstrably small given our current process, and we are
constantly researching and developing techniques to improve on that
rate.

Hope this helps,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: Rules for Large International ISPs

2006-12-28 Thread Andy Schmidt
Hi Pete,

Thanks.

Let me apologize for the accusatory tone of my message. Someone pointed out
to me that my annoyance made me cross the line of being offensive.

I would suggest to add some intelligence to the bot F001, where it compares
implicated address ranges against a table of excepted IPs, which you would
build over time (or use some public sources of known-good IP ranges to get a
start).  

I understand the reporting rate of false positives is low. But that may just
be because most false positives simply are never reported.  In my case, I
couldn't use Sniffer to block outright - so for years I never worried much
about false positives.  Only very recently, I have tightened some weights
AND I have improved the reporting to the point that it's now easier for me
to spot certain false positives and have started to report them more
consistently.

Yet, I only review ONE out of a thousand mail boxes and many hundreds of
domains - so chances are a large number of false positives are never even
noticed by me on a daily basis (and I'm a very small operation).

So - the FP rates might be misleading, because they only reflect REPORTED
FPs - no one knows how tiny or possibly how huge UNREPORTED FPs might be.
Consequently, it may be worthwhile to improve F001 as mentioned before.

Best Regards
Andy Schmidt

Phone:  +1 201 934-3414 x20 (Business)
Fax:+1 201 934-9206 


-Original Message-
From: Message Sniffer Community [mailto:[EMAIL PROTECTED] On Behalf
Of Pete McNeil
Sent: Thursday, December 28, 2006 12:04 PM
To: Message Sniffer Community
Subject: [sniffer] Re: Rules for Large International ISPs

Hello Andy,

Thursday, December 28, 2006, 10:34:15 AM, you wrote:

 Hi,

 This morning I had to file to false positive reports because emails 
 from Wanadoo.FR and UOL.COM.BR were triggering SNIFFER-IP.

 I don't know if this is a coincidence or if this is a worrisome new 
 trend

snip/

Our IP rule coding policies have not changed in quite some time and the
false positive rates for IP rules have dropped significantly since the last
change.

IP rules are now coded only by a specialized bot which has very strict rules
and looks only at clean spamtraps for recurring abuse.

 20061228150347  16  0   Match   799799  63  1   48  75
 20061228150347  16  0   Final   799799  63  0   174475

The above rule had been in place for 346 days without any false positive
reports. The rule was coded by the previous robot and at the time was
verified by 3 additional blacklists.

 20061228110558  15  16  Match   1235160 63  1   46  73
 20061228110558  15  16  Final   1235160 63  0   298073

This was coded by the new bot (F001) approximately 28 days ago - no prior
false positives.

IP rules are currently coded by the F001 bot based on direct, repeated
observations at clean spamtraps. IP rules are excluded on the first false
positive report so that they cannot be reactivated without direct human
intervention.

It is not practical for us to keep tabs on, nor deeply research every
possible IP that may be used by any large (or otherwise) ISP. Instead we
have the above policy and very strict observational rules to prevent the
addition of IPs that are likely to produce significant legitimate traffic
and to quickly and permanently remove IPs that cause false positives. (some
exceptions, of course, apply).

It is inevitable that there will be a nonzero error rate - but that error
rate is demonstrably small given our current process, and we are constantly
researching and developing techniques to improve on that rate.

Hope this helps,

_M

--
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]




#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: Rules for Large International ISPs

2006-12-28 Thread Dave Koontz
Well, I guess I will ruffle someones feathers again with my response here,
but like your oringial message, I think we need to be honest here.  This is
not a message sniffer 'popularity' contest after all, we are paying
customers and need to ensure SNF causes no False Postives.

Over the last few months, I've seen more an more false postives from Message
Sniffer.  The few that I sent to their FALSE address have always been
challenged as legitimate.  It's difficult at best for me to believe that our
Local Newspaper and other legitimate sites that are classified by the SNF
EXPERIMENTAL-IP rule are solid.  As a result, I've constructed SA rules to
counteract SNF False Postives.

It got so bad within the last two weeks or so that I completely disabled SNF
lookups to avoid complaints from our users.

To add insult to injury, last year they drastically up the service price.
Now my subscritpion is up for renewal.  I am honestly thinking of NOT
renewing it.  IMO, seems that things have gone down hill since ARM bought
the little company that could  Couple that with two years worth of
promises to update the MDaemon Plugin code, and all the various improvement
that Spam Assassin and SARE rulesets have made...  well I question if it's
worth the inflated cost anymore.

Shoot away Sniffer Cheer-leaders...  at least I am being honest.
 

-Original Message-
From: Message Sniffer Community [mailto:[EMAIL PROTECTED] On Behalf
Of Andy Schmidt
Sent: Thursday, December 28, 2006 1:26 PM
To: Message Sniffer Community
Subject: [sniffer] Re: Rules for Large International ISPs

Hi Pete,

Thanks.

Let me apologize for the accusatory tone of my message. Someone pointed out
to me that my annoyance made me cross the line of being offensive.

I would suggest to add some intelligence to the bot F001, where it compares
implicated address ranges against a table of excepted IPs, which you would
build over time (or use some public sources of known-good IP ranges to get a
start).  

I understand the reporting rate of false positives is low. But that may just
be because most false positives simply are never reported.  In my case, I
couldn't use Sniffer to block outright - so for years I never worried much
about false positives.  Only very recently, I have tightened some weights
AND I have improved the reporting to the point that it's now easier for me
to spot certain false positives and have started to report them more
consistently.

Yet, I only review ONE out of a thousand mail boxes and many hundreds of
domains - so chances are a large number of false positives are never even
noticed by me on a daily basis (and I'm a very small operation).

So - the FP rates might be misleading, because they only reflect REPORTED
FPs - no one knows how tiny or possibly how huge UNREPORTED FPs might be.
Consequently, it may be worthwhile to improve F001 as mentioned before.

Best Regards
Andy Schmidt

Phone:  +1 201 934-3414 x20 (Business)
Fax:+1 201 934-9206 


-Original Message-
From: Message Sniffer Community [mailto:[EMAIL PROTECTED] On Behalf
Of Pete McNeil
Sent: Thursday, December 28, 2006 12:04 PM
To: Message Sniffer Community
Subject: [sniffer] Re: Rules for Large International ISPs

Hello Andy,

Thursday, December 28, 2006, 10:34:15 AM, you wrote:

 Hi,

 This morning I had to file to false positive reports because emails 
 from Wanadoo.FR and UOL.COM.BR were triggering SNIFFER-IP.

 I don't know if this is a coincidence or if this is a worrisome new 
 trend

snip/

Our IP rule coding policies have not changed in quite some time and the
false positive rates for IP rules have dropped significantly since the last
change.

IP rules are now coded only by a specialized bot which has very strict rules
and looks only at clean spamtraps for recurring abuse.

 20061228150347  16  0   Match   799799  63  1   48  75
 20061228150347  16  0   Final   799799  63  0   174475

The above rule had been in place for 346 days without any false positive
reports. The rule was coded by the previous robot and at the time was
verified by 3 additional blacklists.

 20061228110558  15  16  Match   1235160 63  1   46  73
 20061228110558  15  16  Final   1235160 63  0   298073

This was coded by the new bot (F001) approximately 28 days ago - no prior
false positives.

IP rules are currently coded by the F001 bot based on direct, repeated
observations at clean spamtraps. IP rules are excluded on the first false
positive report so that they cannot be reactivated without direct human
intervention.

It is not practical for us to keep tabs on, nor deeply research every
possible IP that may be used by any large (or otherwise) ISP. Instead we
have the above policy and very strict observational rules to prevent the
addition of IPs that are likely to produce significant legitimate traffic
and to quickly and permanently remove IPs that cause false positives. (some
exceptions, of course, apply

[sniffer] Re: Rules for Large International ISPs

2006-12-28 Thread Pete McNeil
Hello Andy,

Thursday, December 28, 2006, 3:16:57 PM, you wrote:

snip/

 need to ensure SNF causes no False Positives 

 I agree here. While I can excuse the occasional accidental FP - there
 should NOT be the mindset that customers just have to live with the fact
 that the IP rules WILL always catch a certain amount of good emails, because
 no effort has been made to exempt known good IP/RevDNS ranges.

The bot does make this effort, though that can always be improved.
Most IP FPs these days are for older rules that at the time they were
created were valid and have shown consistent activity without FP
reports over their lifetime. Those where activity has fallen off have
been automatically removed.

 I also think that the low false positive argument is built on unproven
 assumptions.  To me, researching and reporting a single false positives
 takes a very significant amount of time.  Bigger users may simply have no
 practical way to reporting their false positives and instead just cope
 with it by using weight-based systems to compensate.

To be sure larger systems do tend to have large weight-based systems
in place. None the less we do hear from them when false positives
occur, and we also hear from smaller systems that are more focused on
individual customers and domains.

Where we get our FP data:

We have a range of customers who reliably report false positives to us
including a number of larger ISPs who consistently research and report
their FPs in detail. We also have smaller service providers -- guys
who live in their system who do the same thing-- so we get a fairly
wide perspective. In addition to that we have links into a number of
systems to provide us with rule IDs for messages that are released
from quarantines, etc...

In the new version of SNF we are adding an automated reputation system
component called GBUdb (Good, Bad, Ugly / Unknown, Ignore /
Infrastructure). This system will (among other things) learn the good
IP sources for a given system and automatically override pattern
matching rules that hit known good messages. The system will also
report these conflicts to us and in extreme cases will be able to
auto-panic bad pattern rules so that they not only have no effect on
the local systems but are also automatically withdrawn from the core
rulebase.

(Rule panics are rare, but also destructive. The auto-panic mechanism
should completely mitigate them if/when one slips trough.)

All that by way of saying - we are constantly working to improve our
access to good sources of FP data - even while reducing the system
admin's workload.

 The process of finding clues in the header, then finding the correct log
 file and then matching log file lines in Sniffer, then creating an evidence
 email, is just far too cumbersome.  I should be able to forward any falsely
 identified emails (with SMTP headers) as easily as I can submit real spam
 for analysis.  If that requires that Sniffer has to insert header
 information with the rule number - so be it. My inclination is, if it's
 currently 10 times harder to report false positives than it is to report
 missed spam, then I suspect that the false positive rates could be 10 times
 higher than what's actually being reported.

In many cases this is true -- the cases tend to be platform specific.
In MDaemon, for example, rule id information is injected into the
headers so that FP reporting is a relatively painless process (no
research required). The same is true on most *nix implementations.

On IMail/SmarterMail type implementations it may be possible to add
the ability to add headers to the message - but only at a significant
I/O cost (rewriting the entire message with the new headers more than
once).

I should also note that in most cases our system is able to identify
the rules that matched an FP submission without any additional
research on the part of the submitting admin. Our FP system re-scans
each submission with every known rule -- it is unfortunately also true
that there are some systems that for a variety of reasons modify the
message during the submission process so that the rules no longer
match -- in those cases the research is required in order to move
forward. The good news (if it can be called that) is that the need to
do the research tends to be consistent--- if you are able to submit an
FP without finding the matching log lines then you are likely to be
able to do this consistently, and most folks do fall into this
category.

---

Along with the new engine I am considering some mechanisms that might
be able to store rule matching data along with a message id hash on
the local SNF node for a period of time. If research on this mechanism
indicates that it would be useful and desirable then we may be able to
add a feature that would allow an SNF node to provide the data upon
request when an FP is submitted without having to modify the message
in any way -- provided the FP is discovered and submitted within the
storage window... This is all