> On 11 May 2015, at 19:44, Doug Ewell <[email protected]> wrote: > > Hans Aberg <haberg dash 1 at telia dot com> wrote: > >>>> However I wonder what would be the effect of D80 in UTF-32: is >>>> <0xFFFFFFFF> a valid "32-bit string" ? >>> >>> The value 0xFFFFFFFF cannot appear in a UTF-32 string. Therefore it >>> cannot represent a unit of encoded text in a UTF-32 string. >> >> Even though the values with highest bit set are not a part of original >> UTF-32, it can easily be extended also to original UTF-8, which may be >> simpler to implement. > > "Original UTF-8," regardless of where defined, only ever encoded scalar > values up to 0x7FFFFFFF. See, for example, RFC 2279.
The intended meaning is that also original UTF-8 can be extended to full 32-bit by using 6-byte sequences leading byte 111111xx bit pattern.

