Hi, On Fri, 5 Mar 2004 14:24:24 -0000 (GMT) "Andy Blanchard" <[EMAIL PROTECTED]> wrote:
> ...I've basically > rolled my own (my needs were not that sophisticated) by using an excellent > Perl script called "hgrep" that allows you to grep mail headers by > intelligently dealing with the line breaks for you. You can grab a copy > from here: > > http://www.cpan.org/authors/id/E/EL/ELIJAH/ ... which led me to analyze my spam corpus (the 3219 flagged by SA 2.6x) to get my Top 25 Rule list: 86.30% 2778 HTML_MESSAGE 82.82% 2666 BAYES_99 71.51% 2302 MIME_HTML_ONLY 54.61% 1758 MIME_HTML_NO_CHARSET 50.85% 1637 RCVD_IN_SORBS 37.65% 1212 BIZ_TLD 34.61% 1114 DCC_CHECK 33.55% 1080 HTML_FONT_BIG 26.53% 854 MISSING_MIMEOLE 25.82% 831 RAZOR2_CHECK 25.41% 818 FORGED_OUTLOOK_TAGS 24.79% 798 MIME_HTML_ONLY_MULTI 24.42% 786 RAZOR2_CF_RANGE_51_100 23.33% 751 RCVD_IN_NJABL 22.83% 735 HTML_FONTCOLOR_RED 22.24% 716 CLICK_BELOW 21.78% 701 HTML_IMAGE_ONLY_02 21.00% 676 RCVD_IN_DSBL 19.01% 612 HTML_FONT_INVISIBLE 18.27% 588 HTML_70_80 18.11% 583 USERPASS 17.05% 549 FORGED_OUTLOOK_HTML 16.56% 533 HTML_FONTCOLOR_UNKNOWN 15.91% 512 HTML_60_70 15.56% 501 RCVD_IN_RFCI And with all the network and Bayes tests removed: 86.30% 2778 HTML_MESSAGE 71.51% 2302 MIME_HTML_ONLY 54.61% 1758 MIME_HTML_NO_CHARSET 37.65% 1212 BIZ_TLD 33.55% 1080 HTML_FONT_BIG 26.53% 854 MISSING_MIMEOLE 25.41% 818 FORGED_OUTLOOK_TAGS 24.79% 798 MIME_HTML_ONLY_MULTI 22.83% 735 HTML_FONTCOLOR_RED 22.24% 716 CLICK_BELOW 21.78% 701 HTML_IMAGE_ONLY_02 19.01% 612 HTML_FONT_INVISIBLE 18.27% 588 HTML_70_80 18.11% 583 USERPASS 17.05% 549 FORGED_OUTLOOK_HTML 16.56% 533 HTML_FONTCOLOR_UNKNOWN 15.91% 512 HTML_60_70 13.20% 425 HTML_FONTCOLOR_UNSAFE 12.30% 396 MISSING_OUTLOOK_NAME 12.24% 394 HTML_FONTCOLOR_BLUE 11.96% 385 PENIS_ENLARGE2 11.18% 360 HTML_50_60 10.84% 349 HTML_LINK_CLICK_HERE 9.94% 320 HTTP_EXCESSIVE_ESCAPES 9.85% 317 DATE_IN_FUTURE_12_24 Note that during this period the Tripwire rules changed name from FVGT_TRIPWIRE_xx to TW_xx and rules like Chickenpox, Backhair, Weeds, and Tripwire should be condensed into a group. One should analyze ham as well to see which tests they trigger; you might as well run a mass-check if you want good, detailed statistics. fyi, -- Bob
