Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

Doug Ewell Sun, 13 Jul 2003 17:08:19 -0700

Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

> All this discussion shows that there is an extremely large number of
> glyph variation for the ampersand which is both (at the abstract
> level) a symbol character, and a ligature of two lowercase abstract
> characters. But ligatures for the uppercase "ET" and titlecase "Et"
> do exist as well. For Unicode, only the abstract symbol is encoded,
> but not the ligatures, despite they share a common set of glyphs.


That is one of the essential features of Unicode.  Abstract characters
are encoded; glyph variants (in general) are not.

> Could the variant selectors may be used ? I see that Unicode
> does not allow a free use of variant selectors, which are defined
> only for cases where it would be important to preserve the
> precise semantic of the encoded text, but not as a way to
> preserve the glyphic information (so character variants are
> strictly limited).

That's correct.  The difference between the Arial-style glyph that looks
a bit like a tilted treble clef (U+1D11E) and John's
epsilon-with-solidus and Philippe's e-with-small-attached-t is one of
style only.  The distinction does not need to be encoded in plain text,
any more than the distinction between a lowercase g with one bowl versus
two.

Apparently the math experts really, really needed to make a distinction
in plain text between (e.g.) a less-than-or-equal sign with a horizontal
bottom stroke and one with a slanted bottom stroke.  We can take it on
faith that that distinction is important in plain text, but we don't
need to add more distinctions that probably aren't.

> I don't see a solution for this "problem" within Unicode itself
> (and neither in ISO/IEC 10646), unless a separate standard
> is started to encode glyphs mapped to characters
> (in the UCS-4 space, out of its 17 first planes?). For now the
> safest way is to use specific fonts encoding these glyphs
> in PUA positions, and bind these fonts to the abstract text
> using stylesheets, meta information, or markup languages.
> But with such technic, the abstract text would be modified.
>
> A way to avoid it is to surround the text with markup that
> specifies an explicicit substitution, like this in XML:
>
>     <typo as="&#xF001;">et</typo>,

You probably don't want to start down the slippery slope of encoding
Latin glyph variants as PUA characters.  Check the archives of this
mailing list; you will find that proposals to use the PUA to turn
Unicode into a glyph registry are generally not well received.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

Reply via email to