RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

Asmus Freytag Tue, 18 Dec 2001 17:25:11 -0800

At 03:38 PM 12/18/01 -0800, Rick Cameron wrote:
>Are you planning to add an explicit statement to the Unicode standard that
>the valid range for scalar values is 0..10FFFF? (Or is such a statement
>there, and I've just missed it?)


see below:


>In particular, as the use of 32-bit variables to hold Unicode characters
>becomes more common (apparently most unices make wchar_t 32 bits wide), many
>will imagine that such a variable represents a 32-bit encoding of Unicode,
>with range 0..FFFFFFFF, where it just happens that every value above 10FFFF
>is unassigned.
>
>Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit encoding -
>but that's not stopping uniphiles from storing Unicode data in their
>wchar_t's!

The only way such use is conformant is if it follows UTF-32. The latter is 
clearly specified in http://www.unicode.org/unicode/reports/tr19/ as:

"The following lists the important features of this encoding form:

UTF-32 is restricted in values to the range 0..10FFFF, which precisely 
matches the range of characters defined in the Unicode Standard (and other 
standards such as XML), and those representable by UTF-8 and UTF-16.
"

And Unicode 3.1 (in http://www.unicode.org/unicode/reports/tr27/) states:

"Status of UTF-32
Unicode Technical Report #19, UTF-32, has been elevated to the status of a 
Unicode Standard Annex, making UTF-32 officially a part of the Unicode 
Standard.

...

Because UTF-32 is a fixed-width, 32-bit encoding form, the numerical value 
of a Unicode character in UTF-32 is always precisely identical to the 
Unicode scalar value.

"

When Unicode 4.0 is published, we'll futher clean up the language by not 
requiring an external reference to an external UTF-32 document, among other 
changes. I'm confident that seeing all the revisions applied to the text of 
chapter three, plus our usual editorial tweaks will make it much less 
likely to arrive at the misunderstanding that you were having.

A./

Technical Vice President
The Unicode Consortium
Liaison to ISO/IEC JTC1/SC2/WG2

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

Reply via email to