Sławomir Osipiuk wrote:

> For UTF-16, there has always been (to me) an obvious method as well.
> In addition to the current HS/LS (high surrogate/low surrogate) pairs,
> allow triples: HS/HS/LS and HS/LS/LS. Each triple starts with a HS and
> ends on a LS. A stream of XTF-16 triples is self-synchronizing, though
> an interrupted stream might look like it ends or begins with a valid
> UTF-16 pair resulting in a single-character error.
>
> This has the advantages of not needing any new surrogate code points
> and there being exactly 31 free bits in a triple which means the same
> code space can be accessed.

High and low surrogates carry 10 bits of payload each, so I’m curious where the 
31st bit in a triple comes from.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


Reply via email to