>> Hi, >> >> On Wed, Dec 13, 2017 at 9:08 PM, David B Funk >> <dbf...@engineering.uiowa.edu> wrote: >> > On Wed, 13 Dec 2017, AJ Weber wrote: >> > >> >> Is there an easy way to check if the Subject or From is UTF-8 -- or >> >> non-ASCII -- char set? >> >> >> >> I see in some of my recent spam, either the Subject or the From (sometimes >> >> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, >> >> but >> >> I don't want to qualify on that). >> >> >> >> If I check a header with a "header ... =~" regex rule, is it the raw text >> >> that I will check, or is it the decoded characters I will be checking >> >> against? >> >> >> >> If it's the raw text, I can probably just look for that prefix to indicate >> >> the UTF-8 encoding. >> >> >> >> I do get some legitimate emails with encoded chars and emojis, etc...but I >> >> think I'd like a rule to support it being SPAM in general. >> > >> > >> > As other people have said, the header ":raw" rule form will let you match >> > on >> > that. >> > There are two commonly used encoding methods for UTF-8: >> > Base64 "=?utf-8?B?" >> > Quoted-Printable "=?utf-8?Q?" >> > >> > There's nothing that prevents a mailer from using either for purely 7-bit >> > ASCII, >> > even though it isn't necessary. You are more likely to see that used by >> > international clients. They may just utf-8 encode by default so not to have >> > to do special processing for non 7-bit ASCII headers. >> >> We've been seeing a number of emails with subjects using UTF-8 in an >> attempt to obscure the sender by using some form of 8-bit characters. >> For example, this spells dropbox: >> >> From: "=?utf-8?B?xJByb3Bib8+X?=" <abrinar.gue...@ecacolleges.com> >> >> How would we write a header rule against that? Just use From:raw? >> >> Is it possible to write a rule using the decoded characters, like >> "dr�p-b�x" or "D?op?o?"? >> >> I've also tried variations of "dropbox" such as "dr?pb?x" etc...
Hi Alex, as I live in Germany, I also see nothing special in encoded utf-8 ... Just use the decoded From line rather than the raw version. One thing that certainly is worth detecting is a plain name part containing a different email. (I am not sure if such a rule already exists) Now for your example, you would probably have to write rules with the purported sender's spelling variations and a meta in case the _real_ name and a valid email is detected. Regards Wolfgang