RE: Thoughts on upsizing Unicode (was: Re: Are there [start] emoji [end] style codes?)

Doug Ewell via Unicode Mon, 30 Mar 2026 20:52:25 -0700

Sławomir Osipiuk wrote:

> For UTF-16, there has always been (to me) an obvious method as well.
> In addition to the current HS/LS (high surrogate/low surrogate) pairs,
> allow triples: HS/HS/LS and HS/LS/LS. Each triple starts with a HS and
> ends on a LS. A stream of XTF-16 triples is self-synchronizing, though
> an interrupted stream might look like it ends or begins with a valid
> UTF-16 pair resulting in a single-character error.
>
> This has the advantages of not needing any new surrogate code points
> and there being exactly 31 free bits in a triple which means the same
> code space can be accessed.


High and low surrogates carry 10 bits of payload each, so I’m curious where the 
31st bit in a triple comes from.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org

RE: Thoughts on upsizing Unicode (was: Re: Are there [start] emoji [end] style codes?)

Reply via email to