Sławomir Osipiuk wrote: > For UTF-16, there has always been (to me) an obvious method as well. > In addition to the current HS/LS (high surrogate/low surrogate) pairs, > allow triples: HS/HS/LS and HS/LS/LS. Each triple starts with a HS and > ends on a LS. A stream of XTF-16 triples is self-synchronizing, though > an interrupted stream might look like it ends or begins with a valid > UTF-16 pair resulting in a single-character error. > > This has the advantages of not needing any new surrogate code points > and there being exactly 31 free bits in a triple which means the same > code space can be accessed.
High and low surrogates carry 10 bits of payload each, so I’m curious where the 31st bit in a triple comes from. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
