I've (mostly) completed the update of SARE's coding_html.cf file. Like I did with the General Subject rule set(s), I've split the HTML rule set(s) into four files:
http://www.rulesemporium.com/rules/70_sare_html0.cf - Rules that hit only spam. This is the safest of the four SARE_HTML_* rulesets for use. http://www.rulesemporium.com/rules/70_sare_html1.cf - Unlike 70_sare_html0.cf, the 70_sare_html1.cf ruleset contains rules which do (or in the past have) hit ham during SARE mass-check tests. The S/O calculated by SA's hit-frequencies scripts are all at or above 0.900. Systems which are excessively sensitive to false positives may want to exclude this ruleset, pick and choose among its rules, or lower their scores. http://www.rulesemporium.com/rules/70_sare_html2.cf - 70_sare_genlsub2.cf contains only rules which test for various types of obfuscation within HTML coding. This subset of SARE_HTML_* rules do not hit any emails during SARE mass-check testing against current corpora. Therefore, systems which are very sensitive to SpamAssassin overhead may want to exclude this ruleset to avoid its regex overhead. http://www.rulesemporium.com/rules/70_sare_html3.cf - 70_sare_genlsub3.cf contains a subset of SARE_HTML_* rules which either hit a significant amount of ham during SARE mass-check tests, or hit so few spam that we cannot be confident that our scores are fully appropriate. Systems which are very sensitive to false positives should probably NOT install this ruleset. Links to the PGP signatures for these files and my mass-check results are on http://www.rulesemporium.com/rules.htm#html Note that we have not deleted the obsolete files yet. We will delete 70_sare_coding_html.cf and coding_html.cf from our web site in about one month's time, to give people time to migrate to the new rule set files. A couple of the rules in html3 can readily be improved, and we're working on that. We'll publish the improvements once confirmed through mass-check. Bob Menschel
