Chris Santerre said:
> There are 2 people working on this to have live rolling stats on rules. It
> will be a huge help when it goes public.

The statistics that would be most relevant are:

What rule combinations hit on spam that is slightly  below the threshold
of tagged spam.  This would give you better results then raw rule hits.
What rules hit in combination at each level would be more statistically
signifigant in determing what was not tagged as spam, and could be
theoretically tagged as spam.

So if you generate a new meta rule that hits on a meta rule combination.
Example:
Meta rule hits on more then 4 RBLS+ MIME_ENCODED.
What point total to assign this, so that any ham stay as ham, and
questionable tagged ham, get pushed over the threshold to spam.

To do this correctly you would have to:

Determine to combination rule hits at various spam/ham levels.

Eg:
What is the most common rule combination at a score of 3 or a score of 5?
If a new meta rule is assigned, what is the false positive rate, what is
the average score raised by at various levels?

This assumes that the meta rule combinations seen on incoming email are
specific to your site, and they most probable not.

In this case you are generating new meta rule sets based on rule
combinations seen at your site.

So looking at my incoming volume of email, I see the top rule hits for me
are:
Looking at the last 61,000 mail messages, we see:
The L_RCVD_IN_MANY is a meta rule:

L_RCVD_IN_MANY  ( RCVD_IN_BL_SPAMCOP_NET + RCVD_IN_SBL + RCVD_IN_SORBS +
RCVD_IN_NJABL + RCVD_IN_DYNABLOCK + RCVD_IN_DSBL + RCVD_IN_NJAB_SPAM
+RCVD_IN_NJABL_PROXY + RCVD_IN_RFCI + RCVD_IN_OPM + RCVD_IN_SORBS_HTTP +
RAZOR2_CHECK) > 2
describe L_RCVD_IN_MANY  Message received in more than 2 RBLs
score L_RCVD_IN_MANY  1.5

DSL is:
header DSL     Received =~
/\.adsl\.|dialup|cable|dsl|client\.comcast\.net|client2\.attbi\.com|cpe\.net\.cable\.rogers\.com/i
describe DSL   Sent through DSL connection
score DSL 1.5


--rule hits
4528 => HTML_LINK_CLICK_HERE
4667 => RCVD_IN_OPM_HTTP
4731 => LOCAL_DRUGS_MALEDYSFUNCTION_OBFU
4741 => J_BACKHAIR_33
4750 => RCVD_IN_OPM_SOCKS
4754 => BAD_CREDIT
4777 => J_BACKHAIR_32
4876 => FORGED_MUA_OUTLOOK
4941 => MSGID_FROM_MTA_HEADER
4974 => RANDOMWORD_20
4997 => J_BACKHAIR_11
5011 => HTML_MIME_NO_HTML_TAG
5190 => LOCAL_DRUGS_ANXIETY
5387 => J_BACKHAIR_12
5483 => J_BACKHAIR_22
5635 => J_BACKHAIR_23
5748 => MSGID_FROM_MTA_SHORT
5841 => RCVD_IN_DSBL
6062 => MISSING_MIMEOLE
6181 => HTML_50_60
6404 => HTML_FONTCOLOR_UNKNOWN
7095 => CLICK_BELOW
7154 => ALL_NATURAL
7389 => RANDOMWORD_15
7431 => RCVD_IN_OPM
7665 => LOCAL_DRUGS_MALEDYSFUNCTION
8245 => HTML_FONT_BIG
8334 => RCVD_IN_SORBS_HTTP
8846 => RCVD_IN_SORBS_SOCKS
9122 => BIZ_TLD
9782 => FORGED_RCVD_NET_HELO
11131 => RANDOMWORD_10
11696 => RCVD_IN_NJABL_PROXY
13026 => MIME_HTML_NO_CHARSET
14045 => MIME_HTML_ONLY_MULTI
16823 => RCVD_IN_NJABL
17373 => DSL
21361 => RAZOR2_CHECK
23169 => RAZOR2_CF_RANGE_51_100
24394 => RCVD_IN_DYNABLOCK
26727 => DCC_CHECK
27090 => MIME_HTML_ONLY
32368 => PYZOR_CHECK
33957 => L_RCVD_IN_MANY
35353 => HTML_MESSAGE
36808 => RCVD_IN_SORBS
41738 => SPAMCOP_URI_RBL
43350 => RCVD_IN_BL_SPAMCOP_NET
58153 => BAYES_99

-- 
Luke Computer Science System Administrator
Security Administrator,College of Engineering
Montana State University-Bozeman,Montana

Reply via email to