> Surely Unicode didn't waste two planes for something that > no one can practically use.
Plane 15 and Plane 16 private use characters weren't the invention of the UTC, by the way. They derive from the original specification of ISO/IEC 10646-1. From ISO/IEC 10646-1: 1993: "The code positions of 32 planes from Plane E0 to Plane FF of Group 00 shall be for Private Use. "The code positions of the 32 groups from Group 60 to Group 7F shall be for Private Use." That would have been: U-00E00000..U-00FFFFFD U-60000000..U-7FFFFFFD That was 8224 *planes* of private use code positions. Amendment 1 (the one that defined UTF-16) amended that to read: "The code positions of the 32 groups from Group 60 to Group 7F shall be for private use. "The code positions of Plane 0F and Plane 10, and of the 32 planes from Plane E0 to Plane FF, of Group 00 shall be for private use. "The 6400 code positions E000 to F8FF of the Basic Multilingual Plane shall be for private use." That was 8226 *planes* of private use code positions, besides the 6400 code positions on the BMP (which had been defined earlier, but not spelled out in the same clause with the rest of the private use allocation). The addition of Plane 0F and Plane 10 was so there were some private use planes accessible via UTF-16. In that grand proliferation of "wastage", 10646 allowed for 539,089,084 private use code positions. That was a wee tad more than anyone actually needed to use, by the way. More recent amendments to 10646 have simply settled on the principle that *all* code positions beyond U-0010FFFF are reserved, leaving the 6400 private use code positions on the BMP, plus Plane 0F and Plane 10. In the grand scheme of things, that seems to be the Goldilocks solution -- not too small (6400) and not too big (539,089,084) -- but juuuust right (137,468). There are people who have valid reasons for making use of Plane 0F or Plane 10 private use characters, by the way, but most of those reasons have to do with CJK. And the reason for that should be pretty obvious -- only the CJK script deals with the kind of entity numbers (multiple 10's of thousands) that make the 6400 code points of the BMP PUA seem cramped. *Any* other unencoded script, for example, with the possible exceptions of Egyptian hieroglyphics or Tangut ideographs, would fit into the BMP PUA with plenty of room to spare. --Ken

