Bernard Miller scripsit: > I’m afraid I have a little bit of a beef about the > Unicode documentation here, forgive me if this has > already been brought up. How come UAX #27 says that > Unicode 3.0 had 34 non characters, 32 of which are in > supplementary planes? First of all, there are no > characters defined in supplementary planes in Unicode > 3.0.
Correct. However, the codepoints FFFE and FFFF in *every* plane have been non-characters since Unicode 2.0 or even earlier. They were mentioned in ISO 10646 if not in Unicode itself. > How many planes are defined in Unicode 3.1? UAX #27 > seems to indicate that it depends on what > transformation format is used (“A process shall > interpret the Unicode code units in accordance with > the Unicode Transformation Format used.”). UTF-8 seems > to only define 17 planes but UTF-32 seems to have 128 > groups of 256 planes. There are only 17 planes, period. Code units in UTF-32 greater than 0x10FFFF are not valid codepoints. > UAX #27 says that Unicode 3.1 > defines 3 new supplementary planes... including plane > 14. I have difficulty with that statement.. does that > mean that there are only 3 new planes, or that there > are (at least) 14 new planes, but only 3 of which have > plane names and characters in them? At least 17 planes > must be defined in order to define the 32 non > characters in 16 supplementary planes, that’s what > common sense would say anyway. Unicode 3.1 defined characters in three of the existing 16 supplementary planes. The planes themselves have been here since 2.0. > This whole “plane” business suffers from a lack of > documentation. UAX #27 talks about planes as if it’s > ancient history but the Unicode 3.0 book does not > mention planes once (it’s not in the index anyway). I > would like the Unicode documentation to explain > exactly what a plane is without requiring the 10646 > documentation which is only available for a fee. In > fact, according to UAX #27 the planes are defined in > terms of what WILL be in 10646-2. A plane is a sequence of 65536 Unicode scalar values, in the terminology of Unicode 2.0, on a divisible-by-65536 boundary. > I’m trying to get a grasp on exactly how many planes > are defined in Unicode in part because it seems to > affect the number of non characters that are defined. > I also want to know the maximum number of characters > that Unicode can encode. So far I reckon there are > 1114112 (assuming 17 planes) minus 2048 (half > surrogates) minus 2 (special non characters) minus 32 > (“hidden” non characters) minus 32 (non characters due > to some arbitrary association between 16 higher planes > code values and the special non characters code > values) = 1111998 code positions available for > characters. Your reasoning is sound. > What’s with this 1114111 number I’ve seen > on this list? I have no clue. > BTW, it doesn’t make sense for every code position > ending in FFFF or FFFE to be a non character. It doesn't make much sense, but it is the rule anyway. > Why isn’t the same rule applied to the “hidden” non > characters, so that every code value ending in FDD0 to > FDEF is also a non character? Is it to contribute to > their “hidden” nature? No. There is simply no reason to reserve them on the other planes. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values | Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel | --Miles Vorkosigan