Re: check utf-8 subjects/from?

David B Funk Wed, 13 Dec 2017 18:08:57 -0800

On Wed, 13 Dec 2017, AJ Weber wrote:

Is there an easy way to check if the Subject or From is UTF-8 -- or non-ASCII-- char set?
I see in some of my recent spam, either the Subject or the From (sometimesboth) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, butI don't want to qualify on that).
If I check a header with a "header ... =~" regex rule, is it the raw textthat I will check, or is it the decoded characters I will be checkingagainst?
If it's the raw text, I can probably just look for that prefix to indicatethe UTF-8 encoding.
I do get some legitimate emails with encoded chars and emojis, etc...but Ithink I'd like a rule to support it being SPAM in general.


As other people have said, the header ":raw" rule form will let you match on 
that.
There are two commonly used encoding methods for UTF-8:
 Base64 "=?utf-8?B?"
 Quoted-Printable "=?utf-8?Q?"

There's nothing that prevents a mailer from using either for purely 7-bit ASCII,

even though it isn't necessary. You are more likely to see that used byinternational clients. They may just utf-8 encode by default so not to have todo special processing for non 7-bit ASCII headers.



--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: check utf-8 subjects/from?

Reply via email to