On Saturday, September 18, 2004, 11:22:02 PM, Matt wrote:

M> Thanks Pete, but let me just stress the largest issue that I see and I
M> think you already are aware of it.  The new IP classification is the
M> most likely to produce false positives and it's result code of 60 places
M> precedence of that over General, Experimental and Obfuscation hits.
M> There is a large difference in accuracy on my system between IP rules
M> and the other three tests.  I hinted at this when you first made the
M> change from that category being Gray (which I didn't score) to IP but
M> got no response :)

I guess I didn't get the hint. Sorry.

M> I score IP at 4 but the other three are all scored at a 6.  The false
M> positives with things like General tend to drop significantly over time
M> as you report false positives, and I believe it to be over 98% accurate
M> on my system while the IP hits have a much higher false positive rate
M> based on open relay mail servers and message bounces to forged addresses
M> that correspond to your spamtraps (I get a lot of IP hits on the bounce
M> messages that we block, many of these from legitimate servers). I would
M> have desired the IP hits to have been added as a result code of 64 
M> instead of replacing the result code of 60 for this reason.

This makes sense. I didn't think about that at the time because I was
trying to minimize the change. Simply splitting IP rules into a
then-unused group 60 was a very easy change to make.

M> I'm sure that you can run some stats to figure out how often IP hits
M> might override General, Experimental and Obfuscation hits, and get a
M> better idea as to the potential impact of having a generally higher
M> scoring test hit.  I know it would have an effect on weighted systems,
M> though I'm not sure how large that effect might be.  As things stand on
M> my system, IP is the #3 test and I fear that it is stealing hits from
M> more accurate tests, especially the #2 test, Experimental which happens

I'm guessing you mean Experimental Abstract.

M> to be very good at tagging zombies and hitting new sources of spam that
M> aren't as widely blacklisted due to the types of rules that are 
M> present.  Here are some recent numbers from my system:

M> SNIFFER-EXPERIMENTAL...........23.32%
M> SNIFFER-IP.......................................9.70%
M> SNIFFER-OBFUSCATION...............2.02%
M> SNIFFER-GENERAL.........................1.64%

I must be tired, but I don't understand these numbers in this context.
What are the percentages?

M> So now might not be the time for this due to the potential of having to
M> modify configs, but please minimally consider it at the next opportunity
M> where a change such as the Gray to IP rules are done.

I've actually been thinking very strongly of reorganizing the rule
group IDs recently. Especially in light of the new changes we've made
with robots et al. The accuracy of the Experimental IP group has gone
up considerably - and most of the false positives you've discussed
should be eliminated over time (bounces especially).

All that said, I think the first step to reordering the groups might
be to change the sequence of the 4 highest numbers as follows:

63: Experimental Received [IP]
62: Obfuscation
61: Experimental Abstract
60: General

This order is based on a least to most specific order. It turns out
that the majority of General rules are simply specific patterns that
don't fit existing rule groups; Experimental Abstract tend to be
either abstracted patterns from specific or general patterns - or
automatically generated URI candidates; Obfuscation are patterns that
detect obfuscation techniques that are not specific to any particular
kind of spam, and since Received [IP] rules only identify a source
they are the most generalized (whether manually or automatically
generated).

According to a recent spam test quality analysis the accuracy and
coverage for these groups in this order follows like this:

63: Experimental Received [IP]    SA = 0.81 Coverage =  7.63%
62: Obfuscation                   SA = 1.00 Coverage =  2.58%
61: Experimental Abstract         SA = 0.92 Coverage = 25.82%
60: General                       SA = 0.81 Coverage =  1.82%

How would you feel about this order?

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html

Reply via email to