Hi,

On Fri, 5 Mar 2004 14:24:24 -0000 (GMT) "Andy Blanchard" <[EMAIL PROTECTED]> 
wrote:

> ...I've basically
> rolled my own (my needs were not that sophisticated) by using an excellent
> Perl script called "hgrep" that allows you to grep mail headers by
> intelligently dealing with the line breaks for you.  You can grab a copy
> from here:
> 
>    http://www.cpan.org/authors/id/E/EL/ELIJAH/

... which led me to analyze my spam corpus (the 3219 flagged by SA 2.6x)
to get my Top 25 Rule list:

86.30%     2778 HTML_MESSAGE
82.82%     2666 BAYES_99
71.51%     2302 MIME_HTML_ONLY
54.61%     1758 MIME_HTML_NO_CHARSET
50.85%     1637 RCVD_IN_SORBS
37.65%     1212 BIZ_TLD
34.61%     1114 DCC_CHECK
33.55%     1080 HTML_FONT_BIG
26.53%      854 MISSING_MIMEOLE
25.82%      831 RAZOR2_CHECK
25.41%      818 FORGED_OUTLOOK_TAGS
24.79%      798 MIME_HTML_ONLY_MULTI
24.42%      786 RAZOR2_CF_RANGE_51_100
23.33%      751 RCVD_IN_NJABL
22.83%      735 HTML_FONTCOLOR_RED
22.24%      716 CLICK_BELOW
21.78%      701 HTML_IMAGE_ONLY_02
21.00%      676 RCVD_IN_DSBL
19.01%      612 HTML_FONT_INVISIBLE
18.27%      588 HTML_70_80
18.11%      583 USERPASS
17.05%      549 FORGED_OUTLOOK_HTML
16.56%      533 HTML_FONTCOLOR_UNKNOWN
15.91%      512 HTML_60_70
15.56%      501 RCVD_IN_RFCI

And with all the network and Bayes tests removed:

86.30%     2778 HTML_MESSAGE
71.51%     2302 MIME_HTML_ONLY
54.61%     1758 MIME_HTML_NO_CHARSET
37.65%     1212 BIZ_TLD
33.55%     1080 HTML_FONT_BIG
26.53%      854 MISSING_MIMEOLE
25.41%      818 FORGED_OUTLOOK_TAGS
24.79%      798 MIME_HTML_ONLY_MULTI
22.83%      735 HTML_FONTCOLOR_RED
22.24%      716 CLICK_BELOW
21.78%      701 HTML_IMAGE_ONLY_02
19.01%      612 HTML_FONT_INVISIBLE
18.27%      588 HTML_70_80
18.11%      583 USERPASS
17.05%      549 FORGED_OUTLOOK_HTML
16.56%      533 HTML_FONTCOLOR_UNKNOWN
15.91%      512 HTML_60_70
13.20%      425 HTML_FONTCOLOR_UNSAFE
12.30%      396 MISSING_OUTLOOK_NAME
12.24%      394 HTML_FONTCOLOR_BLUE
11.96%      385 PENIS_ENLARGE2
11.18%      360 HTML_50_60
10.84%      349 HTML_LINK_CLICK_HERE
9.94%       320 HTTP_EXCESSIVE_ESCAPES
9.85%       317 DATE_IN_FUTURE_12_24

Note that during this period the Tripwire rules changed name from
FVGT_TRIPWIRE_xx to TW_xx and rules like Chickenpox, Backhair, Weeds,
and Tripwire should be condensed into a group.

One should analyze ham as well to see which tests they trigger; you
might as well run a mass-check if you want good, detailed statistics.

fyi,

-- Bob

Reply via email to