"Dreiheller, Albrecht" <[email protected]> wrote:
 |In this context, it might be useful to know that there are some codepoints
 |in some Chinese multi-byte encodings, which contain a byte looking like
 |a Backslash "\" 0x5C as trail byte.
 |This can cause problems in C-like string literals where \ acts as a 
meta-character.
 |
 |Examples:
 |
 |in BIG5 (Win CP 950) Traditional Chinese 
 |U+03B1 maps to A3 5C
 |U+4E48 maps to A4 5C
 |U+4FDF maps to AB 5C
 |
 |in GBK  (Win CP 936) Simplified Chinese
 |U+2010 maps to A9 5C
 |U+2558 maps to A8 5C
 |U+4E57 maps to 81 5C

Thank you – well of course it is, for every very hungry caterpillar.

--steffen
--- Begin Message ---
From: Steffen Daode Nurpmeso, Saturday, August 31, 2013 4:37 PM

>  Likewise, the byte values used to encode <period>, <slash>,
>  <newline> and <carriage-return> shall not occur as part of any
>  other character in any locale.

In this context, it might be useful to know that there are some codepoints
in some Chinese multi-byte encodings, which contain a byte looking like
a Backslash "\" 0x5C as trail byte.
This can cause problems in C-like string literals where \ acts as a 
meta-character.

Examples:

in BIG5 (Win CP 950) Traditional Chinese 
U+03B1 maps to A3 5C
U+4E48 maps to A4 5C
U+4FDF maps to AB 5C

in GBK  (Win CP 936) Simplified Chinese
U+2010 maps to A9 5C
U+2558 maps to A8 5C
U+4E57 maps to 81 5C



--- End Message ---

Reply via email to