Re: Weird characters (again) getting around filter rules.

Mark London Tue, 16 Dec 2025 22:08:31 -0800

Martin - replace_tag looks interesting. I will give that a try.Especially now that I think I've finally squashed the "Complimentry RoadKit" and free Costco Memberships with free"Keurig K-Elite CoffeeMachine" and other appliances. Are other people getting these? Alwayssomething. - Mark


On 12/16/2025 10:16 PM, [email protected] wrote:

Subject:
Re: Weird characters (again) getting around filter rules.
From:
"Martin F via users" <[email protected]>
Date:
12/16/2025, 6:05 AM
To:
[email protected]
CC:
Martin F <[email protected]>


I've had to deal with quite a bit of obfuscated spam over the years.
I started out having every possible obfuscation in every rule, andwhenever i discovered a new one, i needed to go back and update everysingle rule with the new one. The rules were massive and completelyunreadable.Then i discovered replace_tags, which i can highly recommend lookinginto, if you haven't already:
https://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_ReplaceTags.html
https://github.com/apache/spamassassin/blob/trunk/rules/25_replace.cf
Using this made the rules so much easier to read when you come back tothem 6 months from now, and it's much easier to reuse the sameobfuscations. Just update it in one place and it applies to all rulesusing them.(Sorry, that sounded like a horrible sales-pitch from aTV-advertisement or something..)
I've found the builtin rules are occasionally missing some specialcharacters, so i made a replace_tag for every letter where i includethe built-in one. Here's a couple of examples:
replace_tag        CUSTOM_C (<C>|\xe1\xb4\x84)
replace_tag        CUSTOM_N (<N>|\xe2\x93\x9d|\xc6[\x9e\x9d]|\xef\xbd\x8e)
replace_tag        CUSTOM_V            (<V>)
Then i can add other custom characters i find to each letter there, ifthe built-in rules are not catching the obfuscation.I've found the easiest way to get the characters is a quick pythonfor-loop:
>>> for c in "ṣҿṽҿral":
...     print(f"{c}: {c.encode('utf8')}")
...
ṣ: b'\xe1\xb9\xa3'
ҿ: b'\xd2\xbf'
ṽ: b'\xe1\xb9\xbd'
ҿ: b'\xd2\xbf'
r: b'r'
a: b'a'
l: b'l'
In the end, you can make either one rule that catches both the normaland obfuscated versions, or separate them so you can punish obfuscatedversions even harder:body __BODY_VIAGRA/(^|[^a-zA-Z0-9\.]|<CUSTOM_WORD_SEP>)viagra([^a-zA-Z0-9]|$)/ibody __BODY_VIAGRA_OBF/(^|[^a-zA-Z0-9]|<CUSTOM_WORD_SEP>)(?!\bviagra\b)<CUSTOM_V><CUSTOM_I><CUSTOM_A><CUSTOM_G><CUSTOM_R><CUSTOM_A>([^a-zA-Z0-9]|$)/i
replace_rules    __BODY_VIAGRA __BODY_VIAGRA_OBF
I would say start out with the built-in ones from the 25_replace.cffile, and if you see they're not catching certain characters, startcreating your own versions and add those characters.
As others have pointed out, it might cause issues if you actually havepeople writing in languages that use those special characters, butthat's the eternal joy of managing a spam-filter..

Re: Weird characters (again) getting around filter rules.

Reply via email to