Re: [sniffer] Test ordering/precedence

2004-09-18 Thread Matt
Thanks Pete, but let me just stress the largest issue that I see and I 
think you already are aware of it.  The new IP classification is the 
most likely to produce false positives and it's result code of 60 places 
precedence of that over General, Experimental and Obfuscation hits.  
There is a large difference in accuracy on my system between IP rules 
and the other three tests.  I hinted at this when you first made the 
change from that category being Gray (which I didn't score) to IP but 
got no response :)

I score IP at 4 but the other three are all scored at a 6.  The false 
positives with things like General tend to drop significantly over time 
as you report false positives, and I believe it to be over 98% accurate 
on my system while the IP hits have a much higher false positive rate 
based on open relay mail servers and message bounces to forged addresses 
that correspond to your spamtraps (I get a lot of IP hits on the bounce 
messages that we block, many of these from legitimate servers).  I would 
have desired the IP hits to have been added as a result code of 64 
instead of replacing the result code of 60 for this reason.

I'm sure that you can run some stats to figure out how often IP hits 
might override General, Experimental and Obfuscation hits, and get a 
better idea as to the potential impact of having a generally higher 
scoring test hit.  I know it would have an effect on weighted systems, 
though I'm not sure how large that effect might be.  As things stand on 
my system, IP is the #3 test and I fear that it is stealing hits from 
more accurate tests, especially the #2 test, Experimental which happens 
to be very good at tagging zombies and hitting new sources of spam that 
aren't as widely blacklisted due to the types of rules that are 
present.  Here are some recent numbers from my system:

SNIFFER-EXPERIMENTAL...23.32%
SNIFFER-IP...9.70%
SNIFFER-OBFUSCATION...2.02%
SNIFFER-GENERAL.1.64%
So now might not be the time for this due to the potential of having to 
modify configs, but please minimally consider it at the next opportunity 
where a change such as the Gray to IP rules are done.

Thanks,
Matt


Pete McNeil wrote:
On Saturday, September 18, 2004, 9:07:55 PM, Matt wrote:
M> John,
M> If you read this more carefully, I was not suggesting that
M> action betaken that would affect everyone's system in such a way
M> that it wouldrequire modifications.  The 60 result code was
M> recently changed fromGray rules to IP rules, and that change may or
M> may not suggest amodification to the standard way that Sniffer
M> operates (consideringthat the environment will only return one
M> result code).  Sniffer may ormay not follow the numerical ordering
M> of the result codes at present,but then again, it might. 
M> Regardless, it wouldn't be a bad idea toreview the precedence as a
M> part of ongoing due diligence.  I alsorecommended one potential

I agree it's not a bad idea to review these things from time to time,
and in fact we do quite frequently - though not publicly.
I also agree that making any sweeping changes would probably be a
mistake at this time.
Well guys, here is how it goes.
When more than one rule matches, the one with the lowest symbol #
wins. If there is more than one match within that symbol then the one
that is earliest in the message wins.
This is why we code white rules to symbol 0, or symbol 1 in some
cases; and also why we generally reserve the lower numbered symbols
for any specific user requests.
As much as possible we've ordered the rule groups so that the least
specific rules are found in the higher numbers and the more specific
rules are in the lower numbers.
We even have some rules (work in progress) that are "above band" in
the 65-255 range which have special meanings and functions. These will
become more important later as these features are further developed.
There are a lot of schemes out there that can be used, and in fact we
can use an entirely different scheme for each user if we wish - though
that might be a lot of work (so we might have to charge extra for the
consulting time to develop and maintain such a thing).
The scheme that we have is a little bit out of date*, but it still
seems to work for most folks, so we'll probably keep it around for a
while. We've had a number of alternate schemes suggested, some that
might even be practical to implement - but none that wouldn't cause
quite a bit of upheaval if we suddenly decided to rework everything
for our current users.
In fact, there are only a hand full of people who ever even mention it.
Since your list shows 60, 63, 62, and 61 all at the bottom of your
list I'm guessing that the current voting scheme is probably in line
with your priorities at this point. That is, more specific rules (by
symbol #) seem to line up roughly with your estimate of accuracy.
Hope this helps,
_M
* Little out of date: Spammers almost always reus

Re[2]: [sniffer] Test ordering/precedence

2004-09-18 Thread Pete McNeil
On Saturday, September 18, 2004, 9:07:55 PM, Matt wrote:

M> John,

M> If you read this more carefully, I was not suggesting that
M> action betaken that would affect everyone's system in such a way
M> that it wouldrequire modifications.  The 60 result code was
M> recently changed fromGray rules to IP rules, and that change may or
M> may not suggest amodification to the standard way that Sniffer
M> operates (consideringthat the environment will only return one
M> result code).  Sniffer may ormay not follow the numerical ordering
M> of the result codes at present,but then again, it might. 
M> Regardless, it wouldn't be a bad idea toreview the precedence as a
M> part of ongoing due diligence.  I alsorecommended one potential

I agree it's not a bad idea to review these things from time to time,
and in fact we do quite frequently - though not publicly.

I also agree that making any sweeping changes would probably be a
mistake at this time.

Well guys, here is how it goes.

When more than one rule matches, the one with the lowest symbol #
wins. If there is more than one match within that symbol then the one
that is earliest in the message wins.

This is why we code white rules to symbol 0, or symbol 1 in some
cases; and also why we generally reserve the lower numbered symbols
for any specific user requests.

As much as possible we've ordered the rule groups so that the least
specific rules are found in the higher numbers and the more specific
rules are in the lower numbers.

We even have some rules (work in progress) that are "above band" in
the 65-255 range which have special meanings and functions. These will
become more important later as these features are further developed.

There are a lot of schemes out there that can be used, and in fact we
can use an entirely different scheme for each user if we wish - though
that might be a lot of work (so we might have to charge extra for the
consulting time to develop and maintain such a thing).

The scheme that we have is a little bit out of date*, but it still
seems to work for most folks, so we'll probably keep it around for a
while. We've had a number of alternate schemes suggested, some that
might even be practical to implement - but none that wouldn't cause
quite a bit of upheaval if we suddenly decided to rework everything
for our current users.

In fact, there are only a hand full of people who ever even mention it.

Since your list shows 60, 63, 62, and 61 all at the bottom of your
list I'm guessing that the current voting scheme is probably in line
with your priorities at this point. That is, more specific rules (by
symbol #) seem to line up roughly with your estimate of accuracy.

Hope this helps,
_M

* Little out of date: Spammers almost always reuse URI and numbered
links on multiple campaigns these days. This wasn't the case so much
when we began. One result of this shift is that it is now common to
find Snake-Oil spam matching a porn rule & vice versa. In fact, the
actual kind of spam probably matches the rule group less than 31.6% of
the time (and of course 94% of statistics are made up on the spot -
which means, 1/3 is a guess on my part from looking at spam all day).

We've kept the scheme, however, because there are many rules that we
create which are not based on URI and these tend to remain accurate to
the type of spam. Also, since we generate and review our rules largely
through a manual process - it helps to know what kind of spam we were
looking at when we created the rule. That is, we are less likely to
err while looking at a porn/adult spam than we are when looking at a
travel spam - so differences in our accuracy are likely to develop
along the groups we've selected - even if the type of spam captured by
the rule migrates over time.




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re: [sniffer] Test ordering/precedence

2004-09-18 Thread Matt




John,

If you read this more carefully, I was not suggesting that action be
taken that would affect everyone's system in such a way that it would
require modifications.  The 60 result code was recently changed from
Gray rules to IP rules, and that change may or may not suggest a
modification to the standard way that Sniffer operates (considering
that the environment will only return one result code).  Sniffer may or
may not follow the numerical ordering of the result codes at present,
but then again, it might.  Regardless, it wouldn't be a bad idea to
review the precedence as a part of ongoing due diligence.  I also
recommended one potential solution for customization by controlling the
precedence from within the rule base and I would also imagine that the
new config file could also be used to control this.

So if a change was made, I'm sure it wouldn't be done unless it was
measurable and would be to everyone's benefit, and it if Pete felt the
need, it could be done in such a way so that only those that would want
to change it would need to take action.

I try to make it a practice to consider the needs of others before I
give suggestions or ask for new capabilities, and I did do that in this
case.  I don't doubt that others have slightly different ordering in
terms of what they feel is more and less accurate, and of course
results can vary widely across systems.  Pete is especially sensitive
to these needs and has done a wonderful job of customizing rule bases
without placing the burden on his customers to do so.

Matt




John Tolmachoff (Lists) wrote:

  
  
  
  
  Matt Matt
Matt.
   
  Then
everyone would have to make sure
they made the relevant changes on their systems.
   
  As we have
seen on the Declude Junkmail
list, there will
always be those who set up their systems and then forget about them.
Making a
change like that would cause problems.
   
  
  John
Tolmachoff
  Engineer/Consultant/Owner
  eServices
For You
  
   
  
  -Original
Message-
  From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Matt
  Sent: Saturday,
September 18, 2004 5:28
PM
  To:
[EMAIL PROTECTED]
  Subject: [sniffer]
Test
ordering/precedence
   
  Pete,
  
Given some of the recent changes in the result codes for Sniffer, I
thought I
would inquire about the precedence of the result codes and how these
can affect
systems.
  
On my system I have weighted the result codes differently and overall,
I would
consider the following order to be suggestive of the order of
reliability from
the most reliable to the least reliable.  Note that this is not
scientific, but instead based on doing review and tests that hit less
often
could appear higher in terms of stated reliability though I have
considered
this in making the list:
  1.    SNIFFER-INK(56)
   SNIFFER-CASINO(59)
   SNIFFER-INSURANCE(48)
   SNIFFER-MEDIA(50)
   SNIFFER-GETRICH(57)
   SNIFFER-DEBT(58)
   SNIFFER-PHARMACY(52)
  
2.    SNIFFER-AVSOFT(49)
   SNIFFER-PHISHING(53)
  
3.    SNIFFER-TRAVEL(47)
   SNIFFER-PORN(54)
  
4.    SNIFFER-SPAMWARE(51)
   SNIFFER-OBFUSCATION(61)
   SNIFFER-MALWARE(55)
  
5.    SNIFFER-EXPERIMENTAL(62)
  
6.    SNIFFER-GENERAL(63)
  
7.    SNIFFER-IP(60)
  
I'm not sure exactly how Sniffer orders the precedence of the result
code, but
I would like to recommend that you give some consideration to reviewing
such
things in light of recent changes and also maybe consider allowing us
to
customize the precedence as a part of our rulebase.
  
Thanks,
  
Matt
  
  
  -- 
  =
  MailPure custom filters for Declude JunkMail Pro.
  http://www.mailpure.com/software/
  =
  
  


-- 
=
MailPure custom filters for Declude JunkMail Pro.
http://www.mailpure.com/software/
=




RE: [sniffer] Test ordering/precedence

2004-09-18 Thread John Tolmachoff (Lists)









Matt Matt Matt.

 

Then everyone would have to make sure
they made the relevant changes on their systems.

 

As we have seen on the Declude Junkmail list, there will
always be those who set up their systems and then forget about them. Making a
change like that would cause problems.

 



John Tolmachoff

Engineer/Consultant/Owner

eServices For You



 



-Original Message-
From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matt
Sent: Saturday, September 18, 2004 5:28 PM
To: [EMAIL PROTECTED]
Subject: [sniffer] Test
ordering/precedence

 

Pete,

Given some of the recent changes in the result codes for Sniffer, I thought I
would inquire about the precedence of the result codes and how these can affect
systems.

On my system I have weighted the result codes differently and overall, I would
consider the following order to be suggestive of the order of reliability from
the most reliable to the least reliable.  Note that this is not
scientific, but instead based on doing review and tests that hit less often
could appear higher in terms of stated reliability though I have considered
this in making the list:

1.    SNIFFER-INK(56)
   SNIFFER-CASINO(59)
   SNIFFER-INSURANCE(48)
   SNIFFER-MEDIA(50)
   SNIFFER-GETRICH(57)
   SNIFFER-DEBT(58)
   SNIFFER-PHARMACY(52)

2.    SNIFFER-AVSOFT(49)
   SNIFFER-PHISHING(53)

3.    SNIFFER-TRAVEL(47)
   SNIFFER-PORN(54)

4.    SNIFFER-SPAMWARE(51)
   SNIFFER-OBFUSCATION(61)
   SNIFFER-MALWARE(55)

5.    SNIFFER-EXPERIMENTAL(62)

6.    SNIFFER-GENERAL(63)

7.    SNIFFER-IP(60)


I'm not sure exactly how Sniffer orders the precedence of the result code, but
I would like to recommend that you give some consideration to reviewing such
things in light of recent changes and also maybe consider allowing us to
customize the precedence as a part of our rulebase.

Thanks,

Matt



-- =MailPure custom filters for Declude JunkMail Pro.http://www.mailpure.com/software/=








[sniffer] Test ordering/precedence

2004-09-18 Thread Matt




Pete,

Given some of the recent changes in the result codes for Sniffer, I
thought I would inquire about the precedence of the result codes and
how these can affect systems.

On my system I have weighted the result codes differently and overall,
I would consider the following order to be suggestive of the order of
reliability from the most reliable to the least reliable.  Note that
this is not scientific, but instead based on doing review and tests
that hit less often could appear higher in terms of stated reliability
though I have considered this in making the list:

1.    SNIFFER-INK(56)
   SNIFFER-CASINO(59)
   SNIFFER-INSURANCE(48)
   SNIFFER-MEDIA(50)
   SNIFFER-GETRICH(57)
   SNIFFER-DEBT(58)
   SNIFFER-PHARMACY(52)
  
2.    SNIFFER-AVSOFT(49)
   SNIFFER-PHISHING(53)
  
3.    SNIFFER-TRAVEL(47)
   SNIFFER-PORN(54)
  
4.    SNIFFER-SPAMWARE(51)
   SNIFFER-OBFUSCATION(61)
   SNIFFER-MALWARE(55)
  
5.    SNIFFER-EXPERIMENTAL(62)
  
6.    SNIFFER-GENERAL(63)
  
7.    SNIFFER-IP(60)


I'm not sure exactly how Sniffer orders the precedence of the result
code, but I would like to recommend that you give some consideration to
reviewing such things in light of recent changes and also maybe
consider allowing us to customize the precedence as a part of our
rulebase.

Thanks,

Matt
-- 
=
MailPure custom filters for Declude JunkMail Pro.
http://www.mailpure.com/software/
=