At 03:38 PM 12/18/01 -0800, Rick Cameron wrote: >Are you planning to add an explicit statement to the Unicode standard that >the valid range for scalar values is 0..10FFFF? (Or is such a statement >there, and I've just missed it?)
see below: >In particular, as the use of 32-bit variables to hold Unicode characters >becomes more common (apparently most unices make wchar_t 32 bits wide), many >will imagine that such a variable represents a 32-bit encoding of Unicode, >with range 0..FFFFFFFF, where it just happens that every value above 10FFFF >is unassigned. > >Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit encoding - >but that's not stopping uniphiles from storing Unicode data in their >wchar_t's! The only way such use is conformant is if it follows UTF-32. The latter is clearly specified in http://www.unicode.org/unicode/reports/tr19/ as: "The following lists the important features of this encoding form: UTF-32 is restricted in values to the range 0..10FFFF, which precisely matches the range of characters defined in the Unicode Standard (and other standards such as XML), and those representable by UTF-8 and UTF-16. " And Unicode 3.1 (in http://www.unicode.org/unicode/reports/tr27/) states: "Status of UTF-32 Unicode Technical Report #19, UTF-32, has been elevated to the status of a Unicode Standard Annex, making UTF-32 officially a part of the Unicode Standard. ... Because UTF-32 is a fixed-width, 32-bit encoding form, the numerical value of a Unicode character in UTF-32 is always precisely identical to the Unicode scalar value. " When Unicode 4.0 is published, we'll futher clean up the language by not requiring an external reference to an external UTF-32 document, among other changes. I'm confident that seeing all the revisions applied to the text of chapter three, plus our usual editorial tweaks will make it much less likely to arrive at the misunderstanding that you were having. A./ Technical Vice President The Unicode Consortium Liaison to ISO/IEC JTC1/SC2/WG2

