I'm trying to use this set of rules to spot Chinese or Russian characters in the subject line:

<http://www.timk.de/it-blog/howto-find-chinese-or-russian-spam-encoded-in-utf-8-with-spamassassin/>

To debug the rules, I've replaced the leading __ in sub-rules with T_.

The rules don't seem to match the base64-encoded UTF8 sequences I'm seeing in subject lines.

For example:

X-Spam-Status: No, score=1.7 required=5.0 tests=BAYES_50,
        CHARSET_UTF8_B_SUBJ_LATIN,HTML_FONT_FACE_BAD,HTML_MESSAGE,
T_CHARSET_SUBJECT_UTF8_B_ENCODED,T_CHARSET_SUBJECT_UTF8_ENCODED autolearn=no
        version=3.3.1

Subject: =?utf-8?B?54mp5paZ6K6h5YiS5Y2P6LCDL+iJvueUnw==?=

The first character is 7269 hex, which if the rules are correct should be matched by __CHARSET__UTF8_SUBJ_CJK1.

I'm using this to decode the base64 between the question marks to inspect the result:

<http://www.opinionatedgeek.com/dotnet/tools/base64decode/>

Reply via email to