On Sat, 21 Mar 2020 13:33:18 -0600 Doug Ewell via Unicode <unicode@unicode.org> wrote:
> Eli Zaretskii wrote: > > Emacs uses some of that for supporting charsets that cannot be > > mapped into Unicode. GB18030 is one example of such charsets. The > > internal representation of characters in Emacs is UTF-8, so it uses > > 5-byte UTF-8 like sequences to represent such characters. > When 137,468 private-use characters aren't enough? But they aren't private use! I haven't made any agreement with anyone about using them. Additionally, just as some people seem to think that stray UTF-16 code units should be supported (and occasionally declaring UTF-8 implementations of Unicode standard algorithms to be automatically non-compliant), there is a case for supporting stray UTF-8 code units. Emacs supports the full range of 8-bit byte values - 128 unified with ASCII and the other 128 with high bit set. > What characters exist in GB18030 that don't > exist in Unicode, and have they been proposed for Unicode yet, and > why was none of the PUA space considered appropriate for that in the > meantime? Doesn't GB18030 appropriate some of the PUA for Tibetan (and quite possibly other complex scripts)? I haven't looked up how Emacs handles this. Richard.