On Wed, 2 Jul 2014 21:19:16 +0200 Philippe Verdy <[email protected]> wrote:
> 2014-07-02 20:19 GMT+02:00 David Starner <[email protected]>: > > > I might argue 11111111b for 0x00 in UTF-8 would be technically > > legal > But the same C libraries are also using -1 as end-of-stream values > and if they are converted to bytes, they will be undistinctable from > the NULL character that could be stored everywhere in the stream. A 0xFF byte in a narrow character stream is converted to 0x00FF (int is at least 16 bits wide) in the interfaces while the narrow character end-of-stream value EOF is required to be negative. Unfortunately, the wide character end-of-stream marker WEOF is not required to be negative, but it is not allowed to be a representable character. C appears to prohibit U+FFFF as well as supplementary characters if wchar_t is only 16 bits wide. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

