I wrote: > High and low surrogates carry 10 bits of payload each, so I’m curious > where the 31st bit in a triple comes from.
I see it now: > In addition to the current HS/LS (high surrogate/low surrogate) pairs, > allow triples: HS/HS/LS and HS/LS/LS. HS/HS/LS gives you 2³⁰ code points, and HS/LS/LS gives you another 2³⁰, for a total of 2³¹. Of course, there would be two different ways to encode U+0000 through U+FFFF (one 16-bit code unit or three), and two different ways to encode U+10000 through U+10FFFF (two code units or three), so some invalid sequences would need to be defined, similar to those in UTF-8. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
