How many possible characters? (was: Re: Names of planes...)

Doug Ewell Tue, 11 Jul 2000 23:17:33 -0700

Asmus Freytag <[EMAIL PROTECTED]> wrote:

> There are 0x10FFFF - 34 possible characters!
>
> All code values ending in 0xFFFE and OxFFFF do *not* refer to
> characters. They are not just temporarily unassigned, but permanently
> reserved as non-characters.

Right, but we should start with 0x110000, not 0x10FFFF (since U+0000
NULL is a perfectly legitimate character), then subtract 34 (U+??FFFE
and U+??FFFF for each of 17 planes), then subtract another 2,048 for
the surrogate codepoints (U+D800 through U+DFFF).  That leaves us with
1,112,030 possible characters.  There will be a test next period.

Then Robert Lozyniak <[EMAIL PROTECTED]> wrote:

> Okay, 0x10FFDE different characters. But what of planes 15 and 16?

Planes 15 and 16 are for private-use characters, just like the range
from U+E000 to U+F8FF.  These still count as "possible characters."

and then "john" <[EMAIL PROTECTED]> wrote:

> Clarification request: Does that mean
> None of the code values ending in 0xFFFE and 0xFFFF refer to
> characters?
>
> or
>
> Not all of the code values ending in 0xFFFE and 0xFFFF refer to
> characters (i..e some do and some do not)?

The first one.  For all x where ((x & 0x00FFFE) == 0x00FFFE), x is not
a valid character.

BTW, it's interesting that the FAQ claims this is "for no good reason,"
when in fact I can think of a good reason to at least exclude the
characters ending in FFFE:  if expressed in UTF-32 little-endian and
appearing at the beginning of a file, they could fool an auto-detection
scheme into thinking the file is UTF-16 big-endian.

-Doug Ewell
 Fullerton, California

How many possible characters? (was: Re: Names of planes...)

Reply via email to