Re: encoding vs charset

Mark J. Reed Tue, 15 Jul 2008 14:45:26 -0700

> unicode:"\ab" is illegal

No way.  "Unicode" "\ab" should represent U+00AB.  I don't care what
the byte-level representation is.  In UTF-8, that's 0xc2 0xab; in
UTF-16BE it's 0x00 00ab; in UTF-32LE it's 0xab 0x00 0x00 0x00.


> I think that there is still some confusion between the encoding of source code
> with the desired meaning in the charset and the internal encoding of parrot,
> which might be UCS2 or anything.

IMESHO, the encoding of the source code should have no bearing on the
interpretation of string literal escape sequences within that source
code.  "\ab" should mean U+00AB no matter whether the surrounding
source code is UTF-8, ISO-8859-1, Big-5, whatever; if the source
language wants to work differently, it's up to its parser to convert.

-- 
Mark J. Reed <[EMAIL PROTECTED]>

Re: encoding vs charset

Reply via email to