Philippe Verdy <verdy_p at wanadoo dot fr> wrote: > All this discussion shows that there is an extremely large number of > glyph variation for the ampersand which is both (at the abstract > level) a symbol character, and a ligature of two lowercase abstract > characters. But ligatures for the uppercase "ET" and titlecase "Et" > do exist as well. For Unicode, only the abstract symbol is encoded, > but not the ligatures, despite they share a common set of glyphs.
That is one of the essential features of Unicode. Abstract characters are encoded; glyph variants (in general) are not. > Could the variant selectors may be used ? I see that Unicode > does not allow a free use of variant selectors, which are defined > only for cases where it would be important to preserve the > precise semantic of the encoded text, but not as a way to > preserve the glyphic information (so character variants are > strictly limited). That's correct. The difference between the Arial-style glyph that looks a bit like a tilted treble clef (U+1D11E) and John's epsilon-with-solidus and Philippe's e-with-small-attached-t is one of style only. The distinction does not need to be encoded in plain text, any more than the distinction between a lowercase g with one bowl versus two. Apparently the math experts really, really needed to make a distinction in plain text between (e.g.) a less-than-or-equal sign with a horizontal bottom stroke and one with a slanted bottom stroke. We can take it on faith that that distinction is important in plain text, but we don't need to add more distinctions that probably aren't. > I don't see a solution for this "problem" within Unicode itself > (and neither in ISO/IEC 10646), unless a separate standard > is started to encode glyphs mapped to characters > (in the UCS-4 space, out of its 17 first planes?). For now the > safest way is to use specific fonts encoding these glyphs > in PUA positions, and bind these fonts to the abstract text > using stylesheets, meta information, or markup languages. > But with such technic, the abstract text would be modified. > > A way to avoid it is to surround the text with markup that > specifies an explicicit substitution, like this in XML: > > <typo as="">et</typo>, You probably don't want to start down the slippery slope of encoding Latin glyph variants as PUA characters. Check the archives of this mailing list; you will find that proposals to use the PUA to turn Unicode into a glyph registry are generally not well received. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

