Using current NET results from quinlan, jm, parkerm, theo, daf: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 275491 240492 34999 0.873 0.00 0.00 (all messages) 100.000 87.2958 12.7042 0.873 0.00 0.00 (all messages as %) 11.191 12.8079 0.0829 0.994 0.90 0.01 T_RCVD_IN_AHBL_PROXY 16.714 19.1183 0.1914 0.990 0.90 0.00 __RCVD_IN_AHBL 5.643 6.4489 0.1057 0.984 0.87 0.01 T_RCVD_IN_AHBL_SPAM 7.986 9.1130 0.2400 0.974 0.85 0.01 T_RCVD_IN_AHBL_RHSBL [the rest is not noteworthy] 0.225 0.2524 0.0343 0.880 0.62 0.01 T_RCVD_IN_AHBL_SPAM_SUPPORT 0.042 0.0478 0.0029 0.944 0.76 0.01 T_RCVD_IN_AHBL_UNKNOWN_1 0.370 0.0316 2.6972 0.012 0.88 -0.01 T_RCVD_IN_AHBL_EXEMPT_T 0.249 0.0237 1.7943 0.013 0.87 -0.01 T_RCVD_IN_AHBL_EXEMPT_O 0.015 0.0166 0.0000 1.000 0.90 0.01 T_RCVD_IN_AHBL_CMPR_DDOS 0.013 0.0154 0.0000 1.000 0.90 0.01 T_RCVD_IN_AHBL_CMPR_RELAY 0.010 0.0116 0.0000 1.000 0.90 0.01 T_RCVD_IN_AHBL_CMPR_VIRUS 0.002 0.0021 0.0000 1.000 0.90 0.01 T_RCVD_IN_AHBL_SMTP 0.000 0.0000 0.0000 0.500 0.11 0.01 T_RCVD_IN_AHBL_5XXI [more zeroes left out]
so, it's mostly good for PROXY and perhaps also RHSBL and SPAM. The rest is pretty much noise. ------------------------------------------------------------------------ PROXY vs. XBL, DSBL, and other open proxy blacklists: 65.624 75.1177 0.3886 0.995 0.99 1.00 RCVD_IN_XBL 57.331 65.5776 0.6686 0.990 0.97 1.10 RCVD_IN_DSBL 1.911 2.1872 0.0143 0.994 0.89 1.62 RCVD_IN_SORBS_SOCKS 0.373 0.4270 0.0029 0.993 0.89 2.90 RCVD_IN_SORBS_WEB 10.356 11.8345 0.2000 0.983 0.88 1.20 RCVD_IN_SORBS_MISC 13.778 15.7394 0.3029 0.981 0.88 1.20 RCVD_IN_NJABL_PROXY 9.198 10.5060 0.2143 0.980 0.87 1.20 RCVD_IN_SORBS_HTTP 0.301 0.3435 0.0114 0.968 0.82 2.70 RCVD_IN_SORBS_ZOMBIE vs. 11.191 12.8079 0.0829 0.994 0.90 0.01 T_RCVD_IN_AHBL_PROXY and overlap: 28412 0.981 0.188 T_RCVD_IN_AHBL_PROXY,RCVD_IN_DSBL 28208 0.974 0.176 T_RCVD_IN_AHBL_PROXY,__RCVD_IN_SORBS 26397 0.912 0.135 T_RCVD_IN_AHBL_PROXY,__RCVD_IN_SBL_XBL 26363 0.911 0.148 T_RCVD_IN_AHBL_PROXY,RCVD_IN_XBL 25413 0.878 0.411 T_RCVD_IN_AHBL_PROXY,__RCVD_IN_NJABL 24282 0.839 0.691 T_RCVD_IN_AHBL_PROXY,RCVD_IN_NJABL_PROXY 22344 0.772 0.841 T_RCVD_IN_AHBL_PROXY,RCVD_IN_SORBS_MISC 19433 0.671 0.826 T_RCVD_IN_AHBL_PROXY,RCVD_IN_SORBS_HTTP ... ------------------------------------------------------------------------ SPAM and overlap: 13855 0.980 0.071 T_RCVD_IN_AHBL_SPAM,__RCVD_IN_SBL_XBL 13321 0.943 0.675 T_RCVD_IN_AHBL_SPAM,RCVD_IN_SBL 10167 0.719 0.063 T_RCVD_IN_AHBL_SPAM,__RCVD_IN_SORBS 8336 0.590 0.075 T_RCVD_IN_AHBL_SPAM,RCVD_IN_BL_SPAMCOP_NET 7746 0.548 0.125 T_RCVD_IN_AHBL_SPAM,__RCVD_IN_NJABL ... ------------------------------------------------------------------------ AHBL vs. other multi-result blacklists: 36.047 41.2629 0.2086 0.995 0.95 2.55 RCVD_IN_SORBS_DUL 1.911 2.1872 0.0143 0.994 0.89 1.62 RCVD_IN_SORBS_SOCKS 0.373 0.4270 0.0029 0.993 0.89 2.90 RCVD_IN_SORBS_WEB 10.356 11.8345 0.2000 0.983 0.88 1.20 RCVD_IN_SORBS_MISC 9.198 10.5060 0.2143 0.980 0.87 1.20 RCVD_IN_SORBS_HTTP 0.792 0.9048 0.0143 0.984 0.86 1.20 RCVD_IN_SORBS_SMTP 0.301 0.3435 0.0114 0.968 0.82 2.70 RCVD_IN_SORBS_ZOMBIE 0.000 0.0000 0.0000 0.500 0.11 0.00 RCVD_IN_SORBS_BLOCK Hmmm.... SORBS seems a bit better due to the huge DUL hit rate. 5.355 6.1333 0.0057 0.999 0.91 0.62 RCVD_IN_NJABL_DIALUP 3.210 3.6750 0.0171 0.995 0.90 0.74 RCVD_IN_NJABL_SPAM 13.778 15.7394 0.3029 0.981 0.88 1.20 RCVD_IN_NJABL_PROXY 0.142 0.1597 0.0200 0.889 0.63 1.41 RCVD_IN_NJABL_RELAY 0.000 0.0000 0.0000 0.500 0.11 0.10 RCVD_IN_NJABL_CGI 0.000 0.0000 0.0000 0.500 0.11 0.10 RCVD_IN_NJABL_MULTI Hmmm.... pretty even. ------------------------------------------------------------------------ RHSBL and overlap: 17934 0.871 0.092 T_RCVD_IN_AHBL_RHSBL,__RCVD_IN_SBL_XBL 14374 0.698 0.090 T_RCVD_IN_AHBL_RHSBL,__RCVD_IN_SORBS 13134 0.638 0.074 T_RCVD_IN_AHBL_RHSBL,RCVD_IN_XBL 10697 0.520 0.071 T_RCVD_IN_AHBL_RHSBL,RCVD_IN_DSBL 10612 0.516 0.096 T_RCVD_IN_AHBL_RHSBL,RCVD_IN_BL_SPAMCOP_NET 7454 0.362 0.079 T_RCVD_IN_AHBL_RHSBL,RCVD_IN_SORBS_DUL Lower overlap than the others, this might be worth keeping (and it's a separate query anyway). Maybe we should let the perceptron take a whack at it. So, why aren't we running the perceptron on nightly/weekly results? ;-) Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
