Daniel Quinlan <[EMAIL PROTECTED]> writes: > 0.380 0.7601 0.0000 1.000 0.93 0.45 LOTS_OF_STUFF:1 > 0.043 0.0867 0.0000 1.000 0.93 0.45 LOTS_OF_STUFF:2 > > bad
I figured this out. It's really a URI rule which worked as a body rule before because we stuffed URIs into the body. I have some test rules that out-do the original now and I'll check 'em in. > 0.510 0.9801 0.0401 0.961 0.82 0.49 TO_HAS_SPACES:1 > 0.160 0.2934 0.0267 0.917 0.72 0.49 TO_HAS_SPACES:2 > > bad Ah, this broke because we changed the :addr code. I'll try to resurrect... > 1.024 2.0401 0.0067 0.997 0.92 2.53 TRACKER_ID:1 > 0.814 1.6201 0.0067 0.996 0.92 2.53 TRACKER_ID:2 > > bad Hmmm... the rule is unchanged, so I think it's just the loss of URIs in the body again. The difference is not as big as LOTS_OF_STUFF, though, so I'm not as inclined to pursue the missing 0.4% of spam hits. Maybe it's worth it, though. > There goes a perfectly good rule for me, even in STATISTICS.txt it was > pretty good: > > 4.432 6.7680 0.0560 0.992 0.94 3.67 MSGID_FROM_MTA_SHORT I have a replacement rule that's just about as good (possibly more correct) and is 12 lines long instead of 96 lines for the original set of MSGID_FROM_MTA* eval rules: 4.473 5.2540 0.1261 0.977 0.87 3.67 MSGID_FROM_MTA_SHORT 5.436 6.3822 0.1646 0.975 0.86 0.01 T_MSGID_FROM_MTA_1 I think the original really high scores was mostly luck due to being able to parse some lines and not others. This rule uses the trusted Received header code so I think it will also solve some of the FP problems that some sites had with MSGID_FROM_MTA_SHORT. -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
