Re: [sniffer] Test ordering/precedence
Thanks Pete, but let me just stress the largest issue that I see and I think you already are aware of it. The new IP classification is the most likely to produce false positives and it's result code of 60 places precedence of that over General, Experimental and Obfuscation hits. There is a large difference in accuracy on my system between IP rules and the other three tests. I hinted at this when you first made the change from that category being Gray (which I didn't score) to IP but got no response :) I score IP at 4 but the other three are all scored at a 6. The false positives with things like General tend to drop significantly over time as you report false positives, and I believe it to be over 98% accurate on my system while the IP hits have a much higher false positive rate based on open relay mail servers and message bounces to forged addresses that correspond to your spamtraps (I get a lot of IP hits on the bounce messages that we block, many of these from legitimate servers). I would have desired the IP hits to have been added as a result code of 64 instead of replacing the result code of 60 for this reason. I'm sure that you can run some stats to figure out how often IP hits might override General, Experimental and Obfuscation hits, and get a better idea as to the potential impact of having a generally higher scoring test hit. I know it would have an effect on weighted systems, though I'm not sure how large that effect might be. As things stand on my system, IP is the #3 test and I fear that it is stealing hits from more accurate tests, especially the #2 test, Experimental which happens to be very good at tagging zombies and hitting new sources of spam that aren't as widely blacklisted due to the types of rules that are present. Here are some recent numbers from my system: SNIFFER-EXPERIMENTAL...23.32% SNIFFER-IP...9.70% SNIFFER-OBFUSCATION...2.02% SNIFFER-GENERAL.1.64% So now might not be the time for this due to the potential of having to modify configs, but please minimally consider it at the next opportunity where a change such as the Gray to IP rules are done. Thanks, Matt Pete McNeil wrote: On Saturday, September 18, 2004, 9:07:55 PM, Matt wrote: M> John, M> If you read this more carefully, I was not suggesting that M> action betaken that would affect everyone's system in such a way M> that it wouldrequire modifications. The 60 result code was M> recently changed fromGray rules to IP rules, and that change may or M> may not suggest amodification to the standard way that Sniffer M> operates (consideringthat the environment will only return one M> result code). Sniffer may ormay not follow the numerical ordering M> of the result codes at present,but then again, it might. M> Regardless, it wouldn't be a bad idea toreview the precedence as a M> part of ongoing due diligence. I alsorecommended one potential I agree it's not a bad idea to review these things from time to time, and in fact we do quite frequently - though not publicly. I also agree that making any sweeping changes would probably be a mistake at this time. Well guys, here is how it goes. When more than one rule matches, the one with the lowest symbol # wins. If there is more than one match within that symbol then the one that is earliest in the message wins. This is why we code white rules to symbol 0, or symbol 1 in some cases; and also why we generally reserve the lower numbered symbols for any specific user requests. As much as possible we've ordered the rule groups so that the least specific rules are found in the higher numbers and the more specific rules are in the lower numbers. We even have some rules (work in progress) that are "above band" in the 65-255 range which have special meanings and functions. These will become more important later as these features are further developed. There are a lot of schemes out there that can be used, and in fact we can use an entirely different scheme for each user if we wish - though that might be a lot of work (so we might have to charge extra for the consulting time to develop and maintain such a thing). The scheme that we have is a little bit out of date*, but it still seems to work for most folks, so we'll probably keep it around for a while. We've had a number of alternate schemes suggested, some that might even be practical to implement - but none that wouldn't cause quite a bit of upheaval if we suddenly decided to rework everything for our current users. In fact, there are only a hand full of people who ever even mention it. Since your list shows 60, 63, 62, and 61 all at the bottom of your list I'm guessing that the current voting scheme is probably in line with your priorities at this point. That is, more specific rules (by symbol #) seem to line up roughly with your estimate of accuracy. Hope this helps, _M * Little out of date: Spammers almost always reus
Re[2]: [sniffer] Test ordering/precedence
On Saturday, September 18, 2004, 9:07:55 PM, Matt wrote: M> John, M> If you read this more carefully, I was not suggesting that M> action betaken that would affect everyone's system in such a way M> that it wouldrequire modifications. The 60 result code was M> recently changed fromGray rules to IP rules, and that change may or M> may not suggest amodification to the standard way that Sniffer M> operates (consideringthat the environment will only return one M> result code). Sniffer may ormay not follow the numerical ordering M> of the result codes at present,but then again, it might. M> Regardless, it wouldn't be a bad idea toreview the precedence as a M> part of ongoing due diligence. I alsorecommended one potential I agree it's not a bad idea to review these things from time to time, and in fact we do quite frequently - though not publicly. I also agree that making any sweeping changes would probably be a mistake at this time. Well guys, here is how it goes. When more than one rule matches, the one with the lowest symbol # wins. If there is more than one match within that symbol then the one that is earliest in the message wins. This is why we code white rules to symbol 0, or symbol 1 in some cases; and also why we generally reserve the lower numbered symbols for any specific user requests. As much as possible we've ordered the rule groups so that the least specific rules are found in the higher numbers and the more specific rules are in the lower numbers. We even have some rules (work in progress) that are "above band" in the 65-255 range which have special meanings and functions. These will become more important later as these features are further developed. There are a lot of schemes out there that can be used, and in fact we can use an entirely different scheme for each user if we wish - though that might be a lot of work (so we might have to charge extra for the consulting time to develop and maintain such a thing). The scheme that we have is a little bit out of date*, but it still seems to work for most folks, so we'll probably keep it around for a while. We've had a number of alternate schemes suggested, some that might even be practical to implement - but none that wouldn't cause quite a bit of upheaval if we suddenly decided to rework everything for our current users. In fact, there are only a hand full of people who ever even mention it. Since your list shows 60, 63, 62, and 61 all at the bottom of your list I'm guessing that the current voting scheme is probably in line with your priorities at this point. That is, more specific rules (by symbol #) seem to line up roughly with your estimate of accuracy. Hope this helps, _M * Little out of date: Spammers almost always reuse URI and numbered links on multiple campaigns these days. This wasn't the case so much when we began. One result of this shift is that it is now common to find Snake-Oil spam matching a porn rule & vice versa. In fact, the actual kind of spam probably matches the rule group less than 31.6% of the time (and of course 94% of statistics are made up on the spot - which means, 1/3 is a guess on my part from looking at spam all day). We've kept the scheme, however, because there are many rules that we create which are not based on URI and these tend to remain accurate to the type of spam. Also, since we generate and review our rules largely through a manual process - it helps to know what kind of spam we were looking at when we created the rule. That is, we are less likely to err while looking at a porn/adult spam than we are when looking at a travel spam - so differences in our accuracy are likely to develop along the groups we've selected - even if the type of spam captured by the rule migrates over time. This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html
Re: [sniffer] Test ordering/precedence
John, If you read this more carefully, I was not suggesting that action be taken that would affect everyone's system in such a way that it would require modifications. The 60 result code was recently changed from Gray rules to IP rules, and that change may or may not suggest a modification to the standard way that Sniffer operates (considering that the environment will only return one result code). Sniffer may or may not follow the numerical ordering of the result codes at present, but then again, it might. Regardless, it wouldn't be a bad idea to review the precedence as a part of ongoing due diligence. I also recommended one potential solution for customization by controlling the precedence from within the rule base and I would also imagine that the new config file could also be used to control this. So if a change was made, I'm sure it wouldn't be done unless it was measurable and would be to everyone's benefit, and it if Pete felt the need, it could be done in such a way so that only those that would want to change it would need to take action. I try to make it a practice to consider the needs of others before I give suggestions or ask for new capabilities, and I did do that in this case. I don't doubt that others have slightly different ordering in terms of what they feel is more and less accurate, and of course results can vary widely across systems. Pete is especially sensitive to these needs and has done a wonderful job of customizing rule bases without placing the burden on his customers to do so. Matt John Tolmachoff (Lists) wrote: Matt Matt Matt. Then everyone would have to make sure they made the relevant changes on their systems. As we have seen on the Declude Junkmail list, there will always be those who set up their systems and then forget about them. Making a change like that would cause problems. John Tolmachoff Engineer/Consultant/Owner eServices For You -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Matt Sent: Saturday, September 18, 2004 5:28 PM To: [EMAIL PROTECTED] Subject: [sniffer] Test ordering/precedence Pete, Given some of the recent changes in the result codes for Sniffer, I thought I would inquire about the precedence of the result codes and how these can affect systems. On my system I have weighted the result codes differently and overall, I would consider the following order to be suggestive of the order of reliability from the most reliable to the least reliable. Note that this is not scientific, but instead based on doing review and tests that hit less often could appear higher in terms of stated reliability though I have considered this in making the list: 1. SNIFFER-INK(56) SNIFFER-CASINO(59) SNIFFER-INSURANCE(48) SNIFFER-MEDIA(50) SNIFFER-GETRICH(57) SNIFFER-DEBT(58) SNIFFER-PHARMACY(52) 2. SNIFFER-AVSOFT(49) SNIFFER-PHISHING(53) 3. SNIFFER-TRAVEL(47) SNIFFER-PORN(54) 4. SNIFFER-SPAMWARE(51) SNIFFER-OBFUSCATION(61) SNIFFER-MALWARE(55) 5. SNIFFER-EXPERIMENTAL(62) 6. SNIFFER-GENERAL(63) 7. SNIFFER-IP(60) I'm not sure exactly how Sniffer orders the precedence of the result code, but I would like to recommend that you give some consideration to reviewing such things in light of recent changes and also maybe consider allowing us to customize the precedence as a part of our rulebase. Thanks, Matt -- = MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ = -- = MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ =
RE: [sniffer] Test ordering/precedence
Matt Matt Matt. Then everyone would have to make sure they made the relevant changes on their systems. As we have seen on the Declude Junkmail list, there will always be those who set up their systems and then forget about them. Making a change like that would cause problems. John Tolmachoff Engineer/Consultant/Owner eServices For You -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matt Sent: Saturday, September 18, 2004 5:28 PM To: [EMAIL PROTECTED] Subject: [sniffer] Test ordering/precedence Pete, Given some of the recent changes in the result codes for Sniffer, I thought I would inquire about the precedence of the result codes and how these can affect systems. On my system I have weighted the result codes differently and overall, I would consider the following order to be suggestive of the order of reliability from the most reliable to the least reliable. Note that this is not scientific, but instead based on doing review and tests that hit less often could appear higher in terms of stated reliability though I have considered this in making the list: 1. SNIFFER-INK(56) SNIFFER-CASINO(59) SNIFFER-INSURANCE(48) SNIFFER-MEDIA(50) SNIFFER-GETRICH(57) SNIFFER-DEBT(58) SNIFFER-PHARMACY(52) 2. SNIFFER-AVSOFT(49) SNIFFER-PHISHING(53) 3. SNIFFER-TRAVEL(47) SNIFFER-PORN(54) 4. SNIFFER-SPAMWARE(51) SNIFFER-OBFUSCATION(61) SNIFFER-MALWARE(55) 5. SNIFFER-EXPERIMENTAL(62) 6. SNIFFER-GENERAL(63) 7. SNIFFER-IP(60) I'm not sure exactly how Sniffer orders the precedence of the result code, but I would like to recommend that you give some consideration to reviewing such things in light of recent changes and also maybe consider allowing us to customize the precedence as a part of our rulebase. Thanks, Matt -- =MailPure custom filters for Declude JunkMail Pro.http://www.mailpure.com/software/=
[sniffer] Test ordering/precedence
Pete, Given some of the recent changes in the result codes for Sniffer, I thought I would inquire about the precedence of the result codes and how these can affect systems. On my system I have weighted the result codes differently and overall, I would consider the following order to be suggestive of the order of reliability from the most reliable to the least reliable. Note that this is not scientific, but instead based on doing review and tests that hit less often could appear higher in terms of stated reliability though I have considered this in making the list: 1. SNIFFER-INK(56) SNIFFER-CASINO(59) SNIFFER-INSURANCE(48) SNIFFER-MEDIA(50) SNIFFER-GETRICH(57) SNIFFER-DEBT(58) SNIFFER-PHARMACY(52) 2. SNIFFER-AVSOFT(49) SNIFFER-PHISHING(53) 3. SNIFFER-TRAVEL(47) SNIFFER-PORN(54) 4. SNIFFER-SPAMWARE(51) SNIFFER-OBFUSCATION(61) SNIFFER-MALWARE(55) 5. SNIFFER-EXPERIMENTAL(62) 6. SNIFFER-GENERAL(63) 7. SNIFFER-IP(60) I'm not sure exactly how Sniffer orders the precedence of the result code, but I would like to recommend that you give some consideration to reviewing such things in light of recent changes and also maybe consider allowing us to customize the precedence as a part of our rulebase. Thanks, Matt -- = MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ =