Martin - replace_tag looks interesting. I will give that a try.
Especially now that I think I've finally squashed the "Complimentry Road
Kit" and free Costco Memberships with free"Keurig K-Elite Coffee
Machine" and other appliances. Are other people getting these? Always
something. - Mark
On 12/16/2025 10:16 PM, [email protected] wrote:
Subject:
Re: Weird characters (again) getting around filter rules.
From:
"Martin F via users" <[email protected]>
Date:
12/16/2025, 6:05 AM
To:
[email protected]
CC:
Martin F <[email protected]>
I've had to deal with quite a bit of obfuscated spam over the years.
I started out having every possible obfuscation in every rule, and
whenever i discovered a new one, i needed to go back and update every
single rule with the new one. The rules were massive and completely
unreadable.
Then i discovered replace_tags, which i can highly recommend looking
into, if you haven't already:
https://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_ReplaceTags.html
https://github.com/apache/spamassassin/blob/trunk/rules/25_replace.cf
Using this made the rules so much easier to read when you come back to
them 6 months from now, and it's much easier to reuse the same
obfuscations. Just update it in one place and it applies to all rules
using them.
(Sorry, that sounded like a horrible sales-pitch from a
TV-advertisement or something..)
I've found the builtin rules are occasionally missing some special
characters, so i made a replace_tag for every letter where i include
the built-in one. Here's a couple of examples:
replace_tag CUSTOM_C (<C>|\xe1\xb4\x84)
replace_tag CUSTOM_N (<N>|\xe2\x93\x9d|\xc6[\x9e\x9d]|\xef\xbd\x8e)
replace_tag CUSTOM_V (<V>)
Then i can add other custom characters i find to each letter there, if
the built-in rules are not catching the obfuscation.
I've found the easiest way to get the characters is a quick python
for-loop:
>>> for c in "ṣҿṽҿral":
... print(f"{c}: {c.encode('utf8')}")
...
ṣ: b'\xe1\xb9\xa3'
ҿ: b'\xd2\xbf'
ṽ: b'\xe1\xb9\xbd'
ҿ: b'\xd2\xbf'
r: b'r'
a: b'a'
l: b'l'
In the end, you can make either one rule that catches both the normal
and obfuscated versions, or separate them so you can punish obfuscated
versions even harder:
body __BODY_VIAGRA
/(^|[^a-zA-Z0-9\.]|<CUSTOM_WORD_SEP>)viagra([^a-zA-Z0-9]|$)/i
body __BODY_VIAGRA_OBF
/(^|[^a-zA-Z0-9]|<CUSTOM_WORD_SEP>)(?!\bviagra\b)<CUSTOM_V><CUSTOM_I><CUSTOM_A><CUSTOM_G><CUSTOM_R><CUSTOM_A>([^a-zA-Z0-9]|$)/i
replace_rules __BODY_VIAGRA __BODY_VIAGRA_OBF
I would say start out with the built-in ones from the 25_replace.cf
file, and if you see they're not catching certain characters, start
creating your own versions and add those characters.
As others have pointed out, it might cause issues if you actually have
people writing in languages that use those special characters, but
that's the eternal joy of managing a spam-filter..