RE: Character identities

Kent Karlsson Thu, 31 Oct 2002 06:54:31 -0800

Let me take a few comparable examples;

1. Some (I think font makers) a few years ago argued
   that the Lithuanian i-dot-circumflex was just a
   glyph variant (Lithuanian specific) of i-circumflex,
   and a few other similar characters.

   Still, the Unicode standard now does not regard those as
   glyph variants (anymore, if it ever did), and embodies
   that the Lithuanian i-dot-circumflex is a different
   character in its casing rules (see SpecialCasing.txt).
   There are special rules for inserting (when lowercasing)
   or removing (when uppercasing) dot-aboves on i-s and I-s
   for Lithuanian.  I can only conclude that it would be
   wrong even for a Lithuanian specific font to display an
   i-circumflex character as an i-dot-circumflex glyph,
   even though an i-circumflex glyph is never used for
   Lithuanian.

2. The Khmer script got allocated a "KHMER SIGN BEYYAL".
   It stands (stood...) for "any abbreviation of the
   Khmer correspondence to etc."; there are at least four
   different abbreviations, much like "etc", "etc.", "&c",
   "et c.", ... It would be up to the font maker to decide
   exactly which abbreviation, and would vary by font.

   However, it is now targeted for deprecation for precisely
   that reason: it is *not* the font (maker) that should
   decide which abbreviation convention to use in a document,
   it is the *"author"* of the document who should decide.
   Just as for the Latin script, the author decides how to
   abbreviate "et cetera". The way of abbreviating should stay
   the same *regardless of font*. Note that the font may be
   chosen at a much later time, and not for wanting to
   change abbreviation convention. That convention one
   may want to have the same throughout a document also
   when using several different fonts in it, not having to
   carefully consider abbreviation conventions when choosing
   fonts.

3. Marco would even allow (by default; I cannot get away
   from that caveat since some (not all) font technologies
   do what they do) displaying the ROMAN NUMERAL ONE THOUSAND
   C D (U+2180) as an M, and it would be up to the font
   designer. While the glyphs are informative, this glyphic
   substitution definitely goes too far.  If the author
   chose to use U+2180, a glyph having at least some
   similarity to the sample glyph should be shown, unless
   and until someone makes a (permanent or transient)
   explicit character change.

4. Some people write è instead of é (I claim they cannot
   spell...).  So is it up to a font designer to display
   é as è if the font is made for a context where many
   people does not make a distinction?  Can a correctly
   spelled name (say) be turned into an apparent misspelling
   by just choosing such a font?  And that would be a Unicode
   font?

5. I can't leave the ö vs. ø; these are just different
   ways of writing "the same" letter; and it is not
   the case that ø is used instead of ö for any 
   7-bit reasons. It is conventional to use ø for ö
   in Norway and Denmark for any Swedish name (or
   word) containing it.  The same goes for ä vs. æ.
   Why shouldn't this one be up to the font makers too?
   If the font is made purely for Norwegian, why not
   display ö as ø, as is the convention?  This is
   *exactly* the same situation as with ä vs. a^e.

I say, let the *"author"* decide in all these cases, and
let that decision stand, *regardless of font changes*.
[There is an implicit qualification there, but I'm
tired of writing it.]

> Kent Karlsson wrote:
> > > I insist that you can talk about character-to-character 
> > > mappings only when
> > > the so-called "backing store" is affected in some way.
> > 
> > No, why?  It is perfectly permissible to do the equivalent
> > of "print(to_upper(mystring))" without changing the backing
> > store ("mystring" in the pseudocode); to_upper here would
> > return a NEW string without changing the argument.
> 
> And that, conceptually, is a character-to-glyph mapping.

Now I have lost you.  How can it be that?  The "print"
part, yes. But not the to_upper part; that is a
character-to-character mapping, inserted between the
"backing store" and "mapping characters to glyphs".
It is still an (apparent) character-to-character
mapping even if it is not stored in the "backing store".

> In my mind, you are so much into the OpenType architecture, 
> and so much used
> to the concept that glyphization is what a font "does", that 
> you can't view the big picture.

Now I have lost you again.  Some fonts (in some font
technologies) do more that "pure" glyphization. This
is why I have been putting in caveats, since many people
seem to think that all fonts *only* do glyphisation,
which is not the case.

But to be general I was referring to such mappings regardless
of if that is built into some font (using character code points
or, as in OT/AAT, using glyph indices) or (better) were external
to the font.

I was trying to use general formulations, but I cannot
avoid having caveats for certain mappings that certain
technologies do (since those are so popular).  But I would
agree that those particular forms of mappings *should not*
be done by fonts (but they are), and instead be done
externally of the fonts (even when transient, as part
of the "rendering").  An advantage would be that if
a particular (named) mapping was asked for (to_upper say),
it would be done the same way regardless of which font
is chosen.  But alas...

                Kind regards
                /kent k

RE: Character identities

Reply via email to