Chris Santerre said: > There are 2 people working on this to have live rolling stats on rules. It > will be a huge help when it goes public.
The statistics that would be most relevant are: What rule combinations hit on spam that is slightly below the threshold of tagged spam. This would give you better results then raw rule hits. What rules hit in combination at each level would be more statistically signifigant in determing what was not tagged as spam, and could be theoretically tagged as spam. So if you generate a new meta rule that hits on a meta rule combination. Example: Meta rule hits on more then 4 RBLS+ MIME_ENCODED. What point total to assign this, so that any ham stay as ham, and questionable tagged ham, get pushed over the threshold to spam. To do this correctly you would have to: Determine to combination rule hits at various spam/ham levels. Eg: What is the most common rule combination at a score of 3 or a score of 5? If a new meta rule is assigned, what is the false positive rate, what is the average score raised by at various levels? This assumes that the meta rule combinations seen on incoming email are specific to your site, and they most probable not. In this case you are generating new meta rule sets based on rule combinations seen at your site. So looking at my incoming volume of email, I see the top rule hits for me are: Looking at the last 61,000 mail messages, we see: The L_RCVD_IN_MANY is a meta rule: L_RCVD_IN_MANY ( RCVD_IN_BL_SPAMCOP_NET + RCVD_IN_SBL + RCVD_IN_SORBS + RCVD_IN_NJABL + RCVD_IN_DYNABLOCK + RCVD_IN_DSBL + RCVD_IN_NJAB_SPAM +RCVD_IN_NJABL_PROXY + RCVD_IN_RFCI + RCVD_IN_OPM + RCVD_IN_SORBS_HTTP + RAZOR2_CHECK) > 2 describe L_RCVD_IN_MANY Message received in more than 2 RBLs score L_RCVD_IN_MANY 1.5 DSL is: header DSL Received =~ /\.adsl\.|dialup|cable|dsl|client\.comcast\.net|client2\.attbi\.com|cpe\.net\.cable\.rogers\.com/i describe DSL Sent through DSL connection score DSL 1.5 --rule hits 4528 => HTML_LINK_CLICK_HERE 4667 => RCVD_IN_OPM_HTTP 4731 => LOCAL_DRUGS_MALEDYSFUNCTION_OBFU 4741 => J_BACKHAIR_33 4750 => RCVD_IN_OPM_SOCKS 4754 => BAD_CREDIT 4777 => J_BACKHAIR_32 4876 => FORGED_MUA_OUTLOOK 4941 => MSGID_FROM_MTA_HEADER 4974 => RANDOMWORD_20 4997 => J_BACKHAIR_11 5011 => HTML_MIME_NO_HTML_TAG 5190 => LOCAL_DRUGS_ANXIETY 5387 => J_BACKHAIR_12 5483 => J_BACKHAIR_22 5635 => J_BACKHAIR_23 5748 => MSGID_FROM_MTA_SHORT 5841 => RCVD_IN_DSBL 6062 => MISSING_MIMEOLE 6181 => HTML_50_60 6404 => HTML_FONTCOLOR_UNKNOWN 7095 => CLICK_BELOW 7154 => ALL_NATURAL 7389 => RANDOMWORD_15 7431 => RCVD_IN_OPM 7665 => LOCAL_DRUGS_MALEDYSFUNCTION 8245 => HTML_FONT_BIG 8334 => RCVD_IN_SORBS_HTTP 8846 => RCVD_IN_SORBS_SOCKS 9122 => BIZ_TLD 9782 => FORGED_RCVD_NET_HELO 11131 => RANDOMWORD_10 11696 => RCVD_IN_NJABL_PROXY 13026 => MIME_HTML_NO_CHARSET 14045 => MIME_HTML_ONLY_MULTI 16823 => RCVD_IN_NJABL 17373 => DSL 21361 => RAZOR2_CHECK 23169 => RAZOR2_CF_RANGE_51_100 24394 => RCVD_IN_DYNABLOCK 26727 => DCC_CHECK 27090 => MIME_HTML_ONLY 32368 => PYZOR_CHECK 33957 => L_RCVD_IN_MANY 35353 => HTML_MESSAGE 36808 => RCVD_IN_SORBS 41738 => SPAMCOP_URI_RBL 43350 => RCVD_IN_BL_SPAMCOP_NET 58153 => BAYES_99 -- Luke Computer Science System Administrator Security Administrator,College of Engineering Montana State University-Bozeman,Montana
