On 7/7/2014 5:09 PM, Philip Prindeville wrote:
On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

On 7/7/2014 2:28 AM, John Wilcock wrote:
Le 05/07/2014 19:08, Philip Prindeville a écrit :
As for encoding a cyrillic small a: there are many ways to do this.
iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this
would be very efficient—there are just too many charsets possible.
Normalising the input message to UTF-8 before body checks would help somewhat 
with that. I seem to remember there's been talk of doing this.

Yes, or utf-16...  I think that will be necessary to keep SA effective in the 
modern world sooner than later.

Okay, but… if the message body is non-ASCII and the CTE is 8bit or base64 and 
no explicit charset has been given, how do you know which translation to 
perform?

I get a lot of Han SPAM in GB2312 where the charset is never specified 
(apparently it’s a national default in China, despite the requirements stated 
in RFC-2045 and -2046).
Sorry, I haven't even started delving into the devilish details but I know it's looming as a needed feature.

regards,
KAM

Reply via email to