Re: Code pages and Unicode

Richard Wordingham Tue, 23 Aug 2011 12:13:38 -0700

On Mon, 22 Aug 2011 16:18:56 -0700
Ken Whistler <k...@sybase.com> wrote:


> How about Clause 12.5 of ISO/IEC 10646:
> 
> <001B, 0025, 0040>
> 
> You "escape" out of UTF-16 to ISO 2022, and then you can do whatever
> the heck you want, including exchange and processing of complete
> 4-byte forms, with all the billions of characters folks seem to think
> they need.

> Of course you would have to convince implementers to honor the ISO
> 2022 escape sequence...

Which they only need to if the text is in an ISO 2022 or similar
context.  Your idea does suggest that a pattern of
<high><high><SO><low> would be reasonable.  The shift-out code U+000E
has no meaning as a Unicode character so it wouldn't be unreasonable to
require a special check that one finds a full character if looking for
a one-character string consisting only of U+000E.  We could also have
<high><high><SI><low> to gives the full *two* thousand million odd
characters that would be resupported by UTF-32.

Richard.

Re: Code pages and Unicode

Reply via email to