Bear in mind these comparisons only work for rules with names that haven't changed.
Here's the current list of the 10 rules with the largest drop in SPAM%. The worst problems in the MIME rules have been fixed. 2.155 4.2803 0.0267 0.994 0.92 0.53 HTML_IMAGE_ONLY_08:1 1.968 3.9136 0.0200 0.995 0.92 1.90 HTML_IMAGE_ONLY_08:2 ok 46.326 81.8188 10.7877 0.884 0.75 0.16 HTML_MESSAGE:1 44.571 80.8787 8.2176 0.908 0.81 0.16 HTML_MESSAGE:2 Okay, I also checked each of the spams that used to render as HTML, but no longer do in a suitably loose MUA and none rendered as HTML. There were also 11 new hits and they do render as HTML in said MUA so it appears that our MIME parsing and HTML detection is working well. 0.607 1.1867 0.0267 0.978 0.87 0.58 HTML_TABLE_THICK_BORD:1 0.470 0.9134 0.0267 0.972 0.85 0.58 HTML_TABLE_THICK_BORD:2 a bit disconcerting 7.108 14.0076 0.2003 0.986 0.91 0.35 HTML_TAG_BALANCE_BODY:1 5.427 10.8207 0.0267 0.998 0.94 0.35 HTML_TAG_BALANCE_BODY:2 good 3.112 5.9604 0.2603 0.958 0.83 0.67 HTML_TAG_BALANCE_HTML:1 1.691 3.3669 0.0134 0.996 0.92 0.67 HTML_TAG_BALANCE_HTML:2 good 0.380 0.7601 0.0000 1.000 0.93 0.45 LOTS_OF_STUFF:1 0.043 0.0867 0.0000 1.000 0.93 0.45 LOTS_OF_STUFF:2 bad 3.756 7.3005 0.2069 0.972 0.86 1.00 MSGID_DOLLARS:1 3.072 6.1404 0.0000 1.000 0.94 1.00 MSGID_DOLLARS:2 good 0.510 0.9801 0.0401 0.961 0.82 0.49 TO_HAS_SPACES:1 0.160 0.2934 0.0267 0.917 0.72 0.49 TO_HAS_SPACES:2 bad 1.024 2.0401 0.0067 0.997 0.92 2.53 TRACKER_ID:1 0.814 1.6201 0.0067 0.996 0.92 2.53 TRACKER_ID:2 bad 2.125 3.7136 0.5340 0.874 0.63 0.69 UPPERCASE_25_50:1 1.865 3.1602 0.5674 0.848 0.57 0.69 UPPERCASE_25_50:2 Probably okay, I suspect it's just the removal of "URI:" from the rendered body. I then looked again at the largest drops in RANK from 2.6x to 3.0 (ignoring ones with tiny 2.6x SPAM% numbers). Theo, I still think these are buglets: 0.077 0.1533 0.0000 1.000 0.93 0.43 MIME_BASE64_ILLEGAL:1 0.127 0.1533 0.1001 0.605 0.21 0.43 MIME_BASE64_ILLEGAL:2 Mailing-list signatures after the end of the data. I think this rule should ignore illegal data at the end of the message if it's within 4 lines of a line beginning with "--" or "__". 14.644 29.1953 0.0734 0.997 0.96 1.06 MIME_HTML_NO_CHARSET:1 20.134 38.8626 1.3818 0.966 0.89 1.06 MIME_HTML_NO_CHARSET:2 This one is probably a bug. 2.959 5.7470 0.1669 0.972 0.86 0.19 MIME_BASE64_NO_NAME:1 3.042 5.7804 0.3004 0.951 0.81 0.19 MIME_BASE64_NO_NAME:2 could be a minor parsing issue 0.864 1.7134 0.0134 0.992 0.91 1.59 MIME_HTML_MOSTLY:1 0.977 1.9201 0.0334 0.983 0.88 1.59 MIME_HTML_MOSTLY:2 nah 0.447 0.7534 0.1402 0.843 0.56 0.92 MSGID_FROM_MTA_HEADER:1 0.407 0.6734 0.1402 0.828 0.53 0.92 MSGID_FROM_MTA_HEADER:2 hrm 4.450 8.8939 0.0000 1.000 0.94 3.67 MSGID_FROM_MTA_SHORT:1 7.599 14.8277 0.3605 0.976 0.88 3.67 MSGID_FROM_MTA_SHORT:2 There goes a perfectly good rule for me, even in STATISTICS.txt it was pretty good: 4.432 6.7680 0.0560 0.992 0.94 3.67 MSGID_FROM_MTA_SHORT Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
