I wrote:

> High and low surrogates carry 10 bits of payload each, so I’m curious
> where the 31st bit in a triple comes from.

I see it now:

> In addition to the current HS/LS (high surrogate/low surrogate) pairs,
> allow triples: HS/HS/LS and HS/LS/LS.

HS/HS/LS gives you 2³⁰ code points, and HS/LS/LS gives you another 2³⁰, for a 
total of 2³¹.

Of course, there would be two different ways to encode U+0000 through U+FFFF 
(one 16-bit code unit or three), and two different ways to encode U+10000 
through U+10FFFF (two code units or three), so some invalid sequences would 
need to be defined, similar to those in UTF-8.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


Reply via email to