On Thu, 18 May 2017 02:04:55 +0200 Philippe Verdy via Unicode <unicode@unicode.org> wrote:
> I find intriguating that the update intends to enforce the decoding > of the **shortest** sequences, but now wants to treat **maximal > sequences** as a single unit with arbitrary length. UTF-8 was > designed to work only with some state machines that would NEVER need > to parse more than 4 bytes. If you look at the sample code in http://www.unicode.org/versions/Unicode2.0.0/appA.pdf, you'll see that it's working with 6-byte sequences. It's the Unicode, as opposed to ISO 10646, version that has always been restricted to 4 bytes. Richard.