Michael Everson <everson at evertype dot com> wrote: > At 08:51 -0700 2002-05-21, Doug Ewell wrote: >> (Deseret and Shavian were encoded in ConScript; whether that helped >> get them into Unicode or not, I don't know.) > > Certainly not. They were examined on their merits just like anything else.
Of course they were. By "helped" I didn't mean that the characters wouldn't otherwise have been worthy of encoding, but that the CSUR assignments might have resulted in additional usage, which in turn got the attention of UTC and/or WG2. I'm trying to examine the passage in TUS 3.0, Section 13.5 (p. 323) which seems to have caught Mr. Overington's fancy: <quote> Promotion of Private-Use Characters. In future versions of the Unicode Standard, some characters that have been defined by one vendor or another in the Corporate Use subarea may be encoded elsewhere as regular Unicode characters if their usage is widespread enough that they become candidates for general use. The code positions in the Private Use Area are permanently reserved for private use -- no assignment to a particular set of characters will ever be endorsed by the Unicode Consortium. </quote> Ignoring the last sentence, because we all seem to be on board with that, I think the image of the PUA that may have emerged from this is that of a test bed for proposed characters. In this scenario, characters are encoded in the PUA *so that* they will gain increased usage, *so that* the UTC will take note of the increased usage and respond by "promoting" the character to Unicode. (I think the use of the word "promotion" in the 13.5 subhead is turning out to be a bad idea, as it implies a simple and straightforward progression.) As I mentioned earlier, as far as I know no script or character has followed this path deliberately -- that is, been encoded in the PUA for the express purpose of satisfying Unicode's "widespread usage" requirement. Of course, we all know (don't we?) that a script or character must satisfy many other criteria as well. Deseret and Shavian obviously did satisfy those criteria, as well as being judged to have sufficiently "widespread usage." Those additional criteria -- not frequency of usage -- are what will prevent additional Latin ligatures from being "promoted" to Unicode. To answer (I hope) some of William's other points: > Well, the ideas are not intended to be quasi-official. Just one end > user of the Unicode system seeking to use the Private Use Area to > good effect and putting forward ideas to other end users who might > like to consider using some of the facilities suggested. Hooray for that. The PUA is there for just that purpose. However, in the spirit of using Unicode, please also respect the character-glyph model, which says (among other things) that a ligature is a glyph requiring a font rendering, not a character requiring a code point. > Now, the fact is that Michael suggested a feature named ZERO WIDTH > LIGATOR specifically for the purpose of ligation and it appears that > that suggestion has not been accepted, but that a shared solution > with a code point that can also mean something else has been decided > upon. Now, I do not know the details of all of this and I certainly > hope to study the matter more, yet, as someone who is not a linguist > as such but an inventor and programmer, I have a concern that using > one code point for two types of meaning rather than one code point > for each type of meaning is what I call a software unicorn. The > concept of a software unicorn can be read about on > http://www.users.globalnet.co.uk/~ngo/euto0008.htm if anyone is > interested. I gather from the article that a software unicorn is an unlikely, perhaps impossible, situation that nevertheless must be handled because it cannot be completely ruled out. Lots of "defensive" code gets written to handle such situations, often with a comment like: default: // this can't happen, but... In this context, I think William is saying that it's risky to overload ZWJ to handle Latin ligation because we can't completely rule out the possibility that we might need ZWJ to "join" Latin characters the way it currently joins Arabic characters. This concern can probably be put to rest by reading the description in Section 13.2 of the Unicode 3.1 Technical Report (UAX #27). The description carefully spells out the relationship between "cursively connected" and "ligated" renditions and the roles ZWJ and ZWNJ play in determining the rendition to be used. > As to strong opposition to encoding additional presentation forms > for alphabetic characters, well, we live in a democratic society and > if some people who would like to produce quality printing feel that > using a TrueType fount with some ligature characters does what they > want and harms no one else, what exactly is the objection? Ah, but it *isn't* harmless. It causes problems for normalization. For homework tonight, read UAX #15, "Unicode Normalization Forms." The key point for our discussion is that creation of additional canonical or compatibility equivalents -- such as a new ligature for two existing characters -- would destabilize the normalization process, because normalization engines based on different versions of Unicode might produce different results. Beyond a certain point in time (defined as Unicode 3.1), no new canonical or compatibility equivalences can be defined. Because of this, a new "ft" ligature could not carry the obvious compatibility mapping to 0066 0074; but that would destroy most of the benefit of encoding it in the first place. Fortunately, it is not necessary to assign new Unicode characters in order to put your favorite Latin ligatures in a font. Just create the "ft" ligature glyph and teach your font to substitute it in place of the unconnected 0066 and 0074 glyphs. You can assign the ligature to a PUA code point if you like, but if the internal mapping is done right it isn't necessary for you to publicize the PUA code point, or for users to use it directly. (I'm not a font designer, but the font designers on this list say this is easy.) > As to whether a Private Use Area implementation has nothing to do > with formal proposals is not, I feel, so clear cut. Certainly, I do > not expect the fact that I have suggested four particular code points > for various padlocks in the Private Use Area to influence a formal > decision. Yet, by suggesting those four code points, if, at various > organizations various people are, without making any public > announcement, trying out a fount with two or four padlock symbols in > them, then maybe, just maybe, they will use the code points that I > suggested in my posting. If they do, this would then mean that if > they try making test applications that make use of the padlock symbols > expressed as Unicode code points then those test applications may be > interoperable with test applications made by other researchers, which > might be of benefit at some stage in the future, if perhaps various > people make test founts with padlock symbols in them available for > trials. In other words, a grass-roots de-facto standard for encoding padlock symbols in the PUA would sort of emerge from the PUA code point allocations you have suggested. Personally, I am skeptical it would work out that way. BTW, regarding the question of two vs. four padlock symbols: You have described a common and vexing problem involving the use of symbols as simultaneous status indicators and prompts. Does that "lock" symbol mean the object is currently locked, or does it mean I should press here to lock it (implying that it is currently unlocked)? However, I don't feel the encoding of padlocks with arrows indicating locking or unlocking action would reduce the confusion, so if and when I write up a proposal, it will be for only two characters. -Doug Ewell Fullerton, California

