I've (mostly) completed the update of SARE's coding_html.cf file.

Like I did with the General Subject rule set(s), I've split the HTML rule
set(s) into four files:

http://www.rulesemporium.com/rules/70_sare_html0.cf - Rules that hit only
spam. This is the safest of the four SARE_HTML_* rulesets for use.

http://www.rulesemporium.com/rules/70_sare_html1.cf - Unlike
70_sare_html0.cf, the 70_sare_html1.cf ruleset contains rules which do
(or in the past have) hit ham during SARE mass-check tests. The S/O
calculated by SA's hit-frequencies scripts are all at or above 0.900.
Systems which are excessively sensitive to false positives may want to
exclude this ruleset, pick and choose among its rules, or lower their
scores.      

http://www.rulesemporium.com/rules/70_sare_html2.cf - 70_sare_genlsub2.cf
contains only rules which test for various types of obfuscation within
HTML coding. This subset of SARE_HTML_* rules do not hit any emails
during SARE mass-check testing against current corpora. Therefore,
systems which are very sensitive to SpamAssassin overhead may want to
exclude this ruleset to avoid its regex overhead.     

http://www.rulesemporium.com/rules/70_sare_html3.cf - 70_sare_genlsub3.cf
contains a subset of SARE_HTML_* rules which either hit a significant
amount of ham during SARE mass-check tests, or hit so few spam that we
cannot be confident that our scores are fully appropriate. Systems which
are very sensitive to false positives should probably NOT install this
ruleset.

Links to the PGP signatures for these files and my mass-check results are
on http://www.rulesemporium.com/rules.htm#html

Note that we have not deleted the obsolete files yet. We will delete
70_sare_coding_html.cf and coding_html.cf from our web site in about one
month's time, to give people time to migrate to the new rule set files.

A couple of the rules in html3 can readily be improved, and we're working
on that. We'll publish the improvements once confirmed through
mass-check.

Bob Menschel



Reply via email to