Hello,
in the last week or two, we have been getting some spam that spamassassin doesn't seem to recognize. A common feature of all these messages seems to be that they contain lots of special characters (~, ^, `, and others) mixed into the text. I put some examples up at http://marie.vtl.ee/spam.txt
Would a rule to calculate some kind of "special chars" vs "total chars" ratio be useful? Does anybody have that kind of rule already?
I don't have any rules that count ratios, however most of example spams are EXACTLY why antidrug.cf was created. It has special rules which detect obfuscated forms of common spam drugs and it will penalize the obfuscations quite heavily.
http://mywebpages.comcast.net/mkettler/sa/antidrug.cf
Disclaimer: I am the author of antidrug, thus I do have a bias. I'd suggest checking the mass-check results for this ruleset that are posted on the spamassasin wiki:
http://wiki.apache.org/spamassassin/CustomRulesets
