>> >> I got the following MIME body part below, and I�m wondering if it would >> >> make sense to filter on this as well. >> >> Given that it�s text/plain with an implicit charset=�us-ascii� and an >> >> implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} >> >> doesn�t really parse into a 16-bit character, would it? That would be a >> >> broken MUA that made such a leap... >> >> Wouldn�t that normally render as the character �&�, �#�, �x�, etc. rather >> >> than the unicode16 or UTF-8 character with that hex value? >> >> There might be times when someone has sent an attachment improperly >> >> encoded this way which might have embedded binary values in it, but >> >> that�s kind of buggy anyway� it should have been done as base64 and >> >> application/octet-stream in the worst of cases if it has arbitrary binary >> >> data. >> >> I wouldn�t want a message where someone gives a couple of examples of >> >> encoding Ѐ for instance being flagged as SPAM, but if the text is >> >> 20% or more of these sequences then I would say that�s SPAM-sign. >> >> Anyway, here�s the body I saw: >> >> --1388-8200-b67c-e579-9c27-df36-12fa-a2eb >> Content-Type: text/plain; >> >> Thе Rеаl >> >> RеаѕоnThе Ꮯоmіng >> >> Ꮯоllарѕе...Thе >> >> rеаl rеаѕоn ᎳHY >> >> HоmеlаndSеcurіtу >> >> rеcеntlу рurchаѕеd1.7 >> >> Bіllіоn Rоundѕ оf >> >> аmmunіtіоn...Ꮃhаt Yоu >> >> Muѕt Dо Tо Ꭼnѕurе >> >> YоurSаfеtуHоmеlаnd >> >> ѕеcurіtу іѕ thеrе >> >> tо ѕеcurеthе >> >> hоmеlаnd оnlу... Sо >> >> thеѕе Ьullеtѕаrе >> >> rеаlу mеаnt fоr >> >> thеThіѕ іѕ аn >> >> еmаіlаdvеrtіѕеmеnt >> >> thаt wаѕ ѕеnt tо >> >> уоu Ьу Ρаtrіоt >> >> Survіvаl Ρlаn. If >> >> уоuwіѕh tо >> >> nоlоngеr rеcеіvе >> >> mеѕѕаgеѕ thаt >> >> рrоmоtе ѕurvіvаl >> >> tірѕ, >> >> рlеаѕеclіck hеrе >> >> tо unѕuЬѕcrіЬе.4 >> >> Unstable as water, thou shalt not excel because thou wentest up to thy >> >> fathers bed then defiledst thou it he went up to my couch.34 And >> >> Pharaohnechoh made Eliakim the son of Josiah king in the room of Josiah >> >> his father, and turned his name to Jehoiakim, and took Jehoahaz away and >> >> he came to Egypt, and died there.37 And the thing was good in the eyes >> >> of Pharaoh, and in the eyes o! >> f all his servants. >> >> --1388-8200-b67c-e579-9c27-df36-12fa-a2eb
Hi, while this is certainly not correct - and likely does not display in every mail client - it would probably work in several webmailers. Perhaps this is the configuration the author of that crap tested. Now, I am somewhat reluctant to classify badly formatted mails as spam: there are many systems around, even from major players, that send legitimate mails like order confirmation, delivery notification, opted-in newsletters but do many of the formal things more right than wrong On the other side, looking at the actual characters shows that the message is spam: these are cyrillic letters that happen to look exactly like western ones (a, e, o or such) so the obvious intent is to avoid detection of the strings. We have seen the same with IDN domain names that might use a cyrillic a to register a domain that looks like, e.g. paypal.com The list of characters is fairly short, so maybe checking for these characters in all commonly used variants (html entities, utf8 encoded, +u0430, \u0430. IDN encoded) would be a good spam indication Regards Wolfgang