On Saturday, September 18, 2004, 11:22:02 PM, Matt wrote: M> Thanks Pete, but let me just stress the largest issue that I see and I M> think you already are aware of it. The new IP classification is the M> most likely to produce false positives and it's result code of 60 places M> precedence of that over General, Experimental and Obfuscation hits. M> There is a large difference in accuracy on my system between IP rules M> and the other three tests. I hinted at this when you first made the M> change from that category being Gray (which I didn't score) to IP but M> got no response :)
I guess I didn't get the hint. Sorry. M> I score IP at 4 but the other three are all scored at a 6. The false M> positives with things like General tend to drop significantly over time M> as you report false positives, and I believe it to be over 98% accurate M> on my system while the IP hits have a much higher false positive rate M> based on open relay mail servers and message bounces to forged addresses M> that correspond to your spamtraps (I get a lot of IP hits on the bounce M> messages that we block, many of these from legitimate servers). I would M> have desired the IP hits to have been added as a result code of 64 M> instead of replacing the result code of 60 for this reason. This makes sense. I didn't think about that at the time because I was trying to minimize the change. Simply splitting IP rules into a then-unused group 60 was a very easy change to make. M> I'm sure that you can run some stats to figure out how often IP hits M> might override General, Experimental and Obfuscation hits, and get a M> better idea as to the potential impact of having a generally higher M> scoring test hit. I know it would have an effect on weighted systems, M> though I'm not sure how large that effect might be. As things stand on M> my system, IP is the #3 test and I fear that it is stealing hits from M> more accurate tests, especially the #2 test, Experimental which happens I'm guessing you mean Experimental Abstract. M> to be very good at tagging zombies and hitting new sources of spam that M> aren't as widely blacklisted due to the types of rules that are M> present. Here are some recent numbers from my system: M> SNIFFER-EXPERIMENTAL...........23.32% M> SNIFFER-IP.......................................9.70% M> SNIFFER-OBFUSCATION...............2.02% M> SNIFFER-GENERAL.........................1.64% I must be tired, but I don't understand these numbers in this context. What are the percentages? M> So now might not be the time for this due to the potential of having to M> modify configs, but please minimally consider it at the next opportunity M> where a change such as the Gray to IP rules are done. I've actually been thinking very strongly of reorganizing the rule group IDs recently. Especially in light of the new changes we've made with robots et al. The accuracy of the Experimental IP group has gone up considerably - and most of the false positives you've discussed should be eliminated over time (bounces especially). All that said, I think the first step to reordering the groups might be to change the sequence of the 4 highest numbers as follows: 63: Experimental Received [IP] 62: Obfuscation 61: Experimental Abstract 60: General This order is based on a least to most specific order. It turns out that the majority of General rules are simply specific patterns that don't fit existing rule groups; Experimental Abstract tend to be either abstracted patterns from specific or general patterns - or automatically generated URI candidates; Obfuscation are patterns that detect obfuscation techniques that are not specific to any particular kind of spam, and since Received [IP] rules only identify a source they are the most generalized (whether manually or automatically generated). According to a recent spam test quality analysis the accuracy and coverage for these groups in this order follows like this: 63: Experimental Received [IP] SA = 0.81 Coverage = 7.63% 62: Obfuscation SA = 1.00 Coverage = 2.58% 61: Experimental Abstract SA = 0.92 Coverage = 25.82% 60: General SA = 0.81 Coverage = 1.82% How would you feel about this order? _M This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html
