Asmus Freytag <[EMAIL PROTECTED]> wrote: > There are 0x10FFFF - 34 possible characters! > > All code values ending in 0xFFFE and OxFFFF do *not* refer to > characters. They are not just temporarily unassigned, but permanently > reserved as non-characters. Right, but we should start with 0x110000, not 0x10FFFF (since U+0000 NULL is a perfectly legitimate character), then subtract 34 (U+??FFFE and U+??FFFF for each of 17 planes), then subtract another 2,048 for the surrogate codepoints (U+D800 through U+DFFF). That leaves us with 1,112,030 possible characters. There will be a test next period. Then Robert Lozyniak <[EMAIL PROTECTED]> wrote: > Okay, 0x10FFDE different characters. But what of planes 15 and 16? Planes 15 and 16 are for private-use characters, just like the range from U+E000 to U+F8FF. These still count as "possible characters." and then "john" <[EMAIL PROTECTED]> wrote: > Clarification request: Does that mean > None of the code values ending in 0xFFFE and 0xFFFF refer to > characters? > > or > > Not all of the code values ending in 0xFFFE and 0xFFFF refer to > characters (i..e some do and some do not)? The first one. For all x where ((x & 0x00FFFE) == 0x00FFFE), x is not a valid character. BTW, it's interesting that the FAQ claims this is "for no good reason," when in fact I can think of a good reason to at least exclude the characters ending in FFFE: if expressed in UTF-32 little-endian and appearing at the beginning of a file, they could fool an auto-detection scheme into thinking the file is UTF-16 big-endian. -Doug Ewell Fullerton, California

