>> >> I got the following MIME body part below, and I�m wondering if it would 
>> >> make sense to filter on this as well.
>> >> Given that it�s text/plain with an implicit charset=�us-ascii� and an 
>> >> implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} 
>> >> doesn�t really parse into a 16-bit character, would it? That would be a 
>> >> broken MUA that made such a leap...
>> >> Wouldn�t that normally render as the character �&�, �#�, �x�, etc. rather 
>> >> than the unicode16 or UTF-8 character with that hex value?
>> >> There might be times when someone has sent an attachment improperly 
>> >> encoded this way which might have embedded binary values in it, but 
>> >> that�s kind of buggy anyway� it should have been done as base64 and 
>> >> application/octet-stream in the worst of cases if it has arbitrary binary 
>> >> data.
>> >> I wouldn�t want a message where someone gives a couple of examples of 
>> >> encoding &#x0400 for instance being flagged as SPAM, but if the text is 
>> >> 20% or more of these sequences then I would say that�s SPAM-sign.
>> >> Anyway, here�s the body I saw:
>> >> --1388-8200-b67c-e579-9c27-df36-12fa-a2eb
>> Content-Type: text/plain;
>> >> Thе Rеаl 
>> >> RеаѕоnThе Ꮯоmіng 
>> >> Ꮯоllарѕе...Thе 
>> >> rеаl rеаѕоn ᎳHY 
>> >> HоmеlаndSеcurіtу 
>> >> rеcеntlу рurchаѕеd1.7 
>> >> Bіllіоn Rоundѕ оf 
>> >> аmmunіtіоn...Ꮃhаt Yоu 
>> >> Muѕt Dо Tо Ꭼnѕurе 
>> >> YоurSаfеtуHоmеlаnd 
>> >> ѕеcurіtу іѕ thеrе 
>> >> tо ѕеcurеthе 
>> >> hоmеlаnd оnlу... Sо 
>> >> thеѕе Ьullеtѕаrе 
>> >> rеаlу mеаnt fоr 
>> >> thеThіѕ іѕ аn 
>> >> еmаіlаdvеrtіѕеmеnt
>> >>  thаt wаѕ ѕеnt tо 
>> >> уоu Ьу Ρаtrіоt 
>> >> Survіvаl Ρlаn. If 
>> >> уоuwіѕh tо 
>> >> nоlоngеr rеcеіvе 
>> >> mеѕѕаgеѕ thаt 
>> >> рrоmоtе ѕurvіvаl 
>> >> tірѕ, 
>> >> рlеаѕеclіck hеrе 
>> >> tо unѕuЬѕcrіЬе.4 
>> >> Unstable as water, thou shalt not excel because thou wentest up to thy 
>> >> fathers bed then defiledst thou it he went up to my couch.34 And 
>> >> Pharaohnechoh made Eliakim the son of Josiah king in the room of Josiah 
>> >> his father, and turned his name to Jehoiakim, and took Jehoahaz away and 
>> >> he came to Egypt, and died there.37  And the thing was good in the eyes 
>> >> of Pharaoh, and in the eyes o!
>> f all his servants.
>> >> --1388-8200-b67c-e579-9c27-df36-12fa-a2eb

Hi,

while this is certainly not correct - and likely does not display in every mail 
client - it would
probably work in several webmailers. Perhaps this is the configuration the 
author of that
crap tested.
Now, I am somewhat reluctant to classify badly formatted mails as spam: there 
are many
systems around, even from major players, that send legitimate mails like order 
confirmation,
delivery notification, opted-in newsletters but do many of the formal things 
more right than wrong
On the other side, looking at the actual characters shows that the message is 
spam: these are
cyrillic letters that happen to look exactly like western ones (a, e, o or 
such) so the obvious intent
is to avoid detection of the strings. We have seen the same with IDN domain 
names that might
use a cyrillic a to register a domain that looks like, e.g. paypal.com
The list of characters is fairly short, so maybe checking for these characters 
in all commonly
used variants (html entities, utf8 encoded, +u0430, \u0430. IDN encoded) would 
be a good
spam indication

Regards
Wolfgang


Reply via email to