In a message dated 2001-10-02 4:50:03 Pacific Daylight Time, [EMAIL PROTECTED] writes:
> Is there an official Unicode Consortium statement that states, for the > record, that the Unicode Consortium refuses to encode more ligatures and > precomposed characters please? I'm pretty sure there is, since it has been brought up so often by UTC members on this list. If there is no such statement, then one should be drafted. > I feel that this is a matter that needs to be formally resolved one way or > the other, so that, if such a refusal has been declared then people who wish > to have these characters encoded may act knowing that the Unicode Consortium > will have legally estopped itself from making any future complaint that it > has some right to set the standards in such a matter and that those people > who would like to see the problem solved and ligatured characters encoded as > single characters so that a font can be produced may proceed accordingly, > perhaps approaching the international standards body directly if the Unicode > Consortium refuses to do so without a process of even considering individual > submissions on their individual merits. On the other hand, if no such > formal statement has been issued, then those people who would like to see > the problem solved and ligatured characters encoded as single characters so > that a font can be produced for use with software such as Microsoft Word may > proceed to define characters in the private use area in a manner compatible > with their possible promotion to being regular unicode characters in the > presentation forms section. Was that only two sentences? Wow.... Regarding the "refusal" to encode more ligatures and precomposed presentation forms: It is not arbitrary. There is a reason why Unicode will not encode these things. They would interfere with the established standard for decomposition. Now that Unicode has reached its present level of popularity, some vendors and implementations (and standards) require a stable set of decomposable code points. That set is Unicode 3.0. If new precomposed characters were added, engines and standards that were built to the new standard would decompose them differently from those built to the old standard, and this is not acceptable to those who need decomposition to work at all. Precomposed characters and ligatures won't be considered "on their individual merits," and they won't be "promoted" from a private standard to true Unicode character status, because the decomposition problem is bigger than the individual merits. Note that I personally like the ct ligature and think it would be a great thing to have in a font. If this were 1993, perhaps it might have been encoded. Regarding fonts: Nothing is stopping you or anyone else from making a font with these precomposed glyphs and associating them with Unicode PUA (Private Use Area) code points. That is an excellent illustration of a possible use of the PUA, and many, many font vendors do just that. > I feel that it would be quite wrong to pull up the ladder on the possibility > of adding characters such as the ct ligature as U+FB07 without the > possibility of consideration of each case on its merits at the time that a > possibility arises. A situation would then exist that several ligatures > have been defined as U+FB00 through to U+FB06 including one long s ligature, > yet that U+FB07 through to U+FB12 must remain unused even though they could > be quite reasonably used for ct and various long s ligatures so as to > produce a set of characters that could be used, if desired, for transcribing > the typography of an 18th Century printed book. Yet, if the ladder has been > pulled up, perhaps U+FB07 can be defined as the ct ligature directly by the > international standards organization and the international standards > organization could decide directly about including the long s ligatures. The organization you are talking about is ISO/IEC JTC1/SC2/WG2. They are firmly committed to maintaining compatibility between Unicode and ISO/IEC 10646. Sorry, but this is a good thing. > If the possibility of fair consideration is, however, still open, then the > ct ligature could be defined as U+E707 within the private use area and > published as part of an independent private initiative amongst those members > of the unicode user community that would like to be able to use that > character in a document by the character being encoded as a character in an > ordinary font file. That would enable font makers to add in the ct > character if they so choose. You might start by checking existing fonts, especially those shipped with major operating systems, to see what PUA code points are commonly used internally for glyphs not associated with a standard Unicode character. I know that several Windows fonts have privately assigned glyphs, and I assume the same is true for Macintosh fonts. Also, maybe the various font makers who haunt this list could contribute any guidelines they know of for quasi-standardizing these code points. Obviously, you are hoping that standardizing the code points could lead to some measure of interoperability; otherwise there would be no discussion. If all you want is to encode the ct ligature in a font, you can use any old PUA character you wish, conformantly. OTOH, private creation of quasi-standards on the part of vendors is not necessarily a good thing. It is the sort of thing that the public tends to vilify Microsoft for doing. If you want to interchange the ct ligature and the long-s ligatures, you can do that right now. Just encode <c, ZWJ, t> or <long-s, ZWJ, whatever>. Then, rendering engines that have a glyph for the desired ligature can render it, and those that don't will fall back to the individual characters (assuming they are conformant). This approach has at least three major advantages: (1) It is already supported by the Unicode Standard. (2) It provides a standard interchange mechanism without requiring font vendors to agree on the code point used for the precomposed glyph. (3) It provides a sensible fallback mechanism for the great majority of fonts that, let's admit it, will not have these specialized glyphs. Think about it. In a message dated 2001-10-02 6:35:16 Pacific Daylight Time, [EMAIL PROTECTED] writes: >> You might want to take a look at the ConScript Unicode Registry, which was >> originally intended for "constructed" and artificial scripts, but which >> could also be used for this purpose. > > No, it couldn't. It's for constructed and artificial scripts, not for > precomposed Latin glyphs. I stand corrected. But there is no reason William couldn't initiate his own registry, along the lines of CSUR, for the purpose of assigning PUA code points to precomposed Latin glyphs. Just don't expect the characters thus added to "graduate" somehow into Unicode. -Doug Ewell Fullerton, California

