On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistler <k...@sybase.com> wrote:
> How about Clause 12.5 of ISO/IEC 10646: > > <001B, 0025, 0040> > > You "escape" out of UTF-16 to ISO 2022, and then you can do whatever > the heck you want, including exchange and processing of complete > 4-byte forms, with all the billions of characters folks seem to think > they need. > Of course you would have to convince implementers to honor the ISO > 2022 escape sequence... Which they only need to if the text is in an ISO 2022 or similar context. Your idea does suggest that a pattern of <high><high><SO><low> would be reasonable. The shift-out code U+000E has no meaning as a Unicode character so it wouldn't be unreasonable to require a special check that one finds a full character if looking for a one-character string consisting only of U+000E. We could also have <high><high><SI><low> to gives the full *two* thousand million odd characters that would be resupported by UTF-32. Richard.