Pete McNeil wrote:
These are hit rates on total messages with a spam percentage of about 87% that weekday, leaving 13% as ham of course.M> SNIFFER-EXPERIMENTAL...........23.32% M> SNIFFER-IP.......................................9.70% M> SNIFFER-OBFUSCATION...............2.02% M> SNIFFER-GENERAL.........................1.64%
I must be tired, but I don't understand these numbers in this context.
What are the percentages?
I've had a few more false positives recently on the Experimental Abstract group, however I haven't yet come to terms with what that means in the face of the increase in hit rates for that group. If this group is populated by automated means, I would consider it to be maybe more wise to have it above the Obfuscation category which I have found to be much more accurate and not specific to a particular host, but instead the content. The General rules are are also very unlikely to hit on personal E-mail, which makes false positives with these two groups much more tolerable because most such content isn't missed that much. When it comes to the IP rules and the Experimental Abstract rules however, many of the false positives are on personal E-mail. If I was to weight the accuracy of the tests considering the difference in how they might hit personal E-mail, I would prefer an order as follows:M> So now might not be the time for this due to the potential of having to M> modify configs, but please minimally consider it at the next opportunity M> where a change such as the Gray to IP rules are done.
I've actually been thinking very strongly of reorganizing the rule group IDs recently. Especially in light of the new changes we've made with robots et al. The accuracy of the Experimental IP group has gone up considerably - and most of the false positives you've discussed should be eliminated over time (bounces especially).
All that said, I think the first step to reordering the groups might be to change the sequence of the 4 highest numbers as follows:
63: Experimental Received [IP]
62: Obfuscation
61: Experimental Abstract
60: General
63: Experimental Received [IP] 62: Experimental Abstract 61: General 60: Obfuscation
I weight the Experimental Abstract and General rules the same on my system, so reversing them isn't such a big deal, but the primary change from your recommendation would be that I would suggest putting Obfuscation rules the lowest result code of the four. I think the likelihood that a rule can hit on a personal E-mail is key in this decision making process. Also consider the hit rate of the Experimental Abstract rules being so high and a more exact non-automated method might make other tests more reliable in the long-term so they should get predominance.
Please take note that Markus' stats are generated from European traffic and I have found at times that there can be a measurable difference between what he is seeing on his system and what I am seeing on my system where about 95% of the legitimate traffic is from North American hosts and to local domains all ending with .com, .net and .org. His FP rate on the General group for instance is many times higher than my own. I would be happy to share my Declude logs with you if you wish to process stats that come from about 200 domains, mostly based in the US. If it doesn't take too long, I would be willing to set up the beta of the stats package for comparative purposes.According to a recent spam test quality analysis the accuracy and coverage for these groups in this order follows like this:
63: Experimental Received [IP] SA = 0.81 Coverage = 7.63%
62: Obfuscation SA = 1.00 Coverage = 2.58%
61: Experimental Abstract SA = 0.92 Coverage = 25.82%
60: General SA = 0.81 Coverage = 1.82%
As far as making changes go, I wouldn't rush into anything, especially since this change would likely mean readjusting all of your clients at once instead of having them download a new release and following instructions. I think it would be a good idea to alert your customers further and more frequently in advance than was done with the Gray to IP rule change. Of course the change wouldn't have a large effect on many systems, but this is definitely something that would affect my own. It is of course easily remedied with some very quick changes in my Declude config file.
Thanks,
Matt
-- ===================================================== MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ =====================================================
This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html
