RE: Character identities

Kent Karlsson Wed, 30 Oct 2002 09:42:20 -0800

> I insist that you can talk about character-to-character 
> mappings only when
> the so-called "backing store" is affected in some way.


No, why?  It is perfectly permissible to do the equivalent
of "print(to_upper(mystring))" without changing the backing
store ("mystring" in the pseudocode); to_upper here would
return a NEW string without changing the argument.

> If the backing store
> is not changed, it is only a character-to-glyph mapping, 
> however complicate and indirect it may be.

Yes.  But with several font technologies "the user" can affect
the mapping in some ways, via "features".  Including what 
*amounts to* mapping to uppercase (an x-height "A" glyph is
an "A" not an "a", even if you have an "a" in the backing
store), or various other changes, like changing diaeresis
to e-above (they are still not glyph variants of eachother,
even in German, which is why DIN asked for e-above, etc.).

My claim is that it is a bad idea for fonts (I don't dare
say "Unicode font" at this point) to do what *amounts to*
such in-effect character mappings *without explicit request*
from whoever is "in charge of" the text in some way (author,
editor, graphic designer, reader who like to make changes to
the text, ...).  Such changes should NOT be the result of
JUST changing font.

(I still think it is a bad idea to build such *in effect*
transient character to character mappings into fonts;
but people are doing that anyway, so...)


> I totally agree with Doug's careful definition, and I am glad 
> that you agree as well.
> 
> Doug indicates two key points that a font must respect to be 
> suitable for Unicode:
> 
> Â« [...] calling a font a "Unicode font" implies two things:
> 1. It must be based on Unicode code points. [...]
> 2. The glyphs must reflect the "essential characteristics" of 
> the Unicode
> character to which they are mapped. [...] Â»
> 
> If we agree that the only requirement for a glyph 
> representing a certain
> Unicode character is to respect the "essential 
> characteristics" which make
> it recognizable, then all our discussion is simply about 
> determining which
> "essential characteristics" a particular character is 
> supposed to have.

So far we agree completely re. that definition.

> To me, a glyph floating atop of letters "a", "o" and "u" is 
> recognizably a
> German umlaut if (a) the text is written in German, and (b) 
> the glyph has
> one of the following shapes:
> 
> 1. Two small "blobs" (e.g. circles, squares, acute accents) 
> places side by side;

I'm going to opt staying on the restrictive side here.

Except for the last one, that is a diaeresis, yes.  That is the
modern standard way of writing "umlaut" in typeset German. The
last one is a double acute, which is normally not used for this
in German, and it is stretching things a bit too far to consider
it a glyph variant of diaeresis.

> 2. A straight horizontal line;

That's a macron.  Not used in *standard orthography* for German.
Using that as a glyph variant for diaeresis is stretching things
quite a lot, even if it occurs in particular forms of handwriting
or some signs. (In handwriting, some people use I-dot-above, or
even I-ring-above.  Does that make them glyph variants of "I",
in a (non-Turkish) font (that mimic handwriting)?  I hope not.
If you want I-ring-above, then do what *in effect* amounts to a
(permanent or transient) mapping to <I, combining-ring-above>.)

> 3. A wavy horizontal line;

That's a tilde.  Not used in *standard orthography* for German.
Using that as a glyph variant for diaeresis is stretching things
quite a lot.  Though it is quite common to use tilde instead of
diaeresis in handwriting.  (If there were a "handwriting" font
feature, what amounts to a transient mapping from diaeresis to
tilde would be expected under that feature.  For some fonts I
might even agree that it might have that "feature" on by default;
but possible to turn off.)

> 4. a small lowercase "e", or something recalling it.

Our major point of disagreement (along with M vs. Roman Numeral
One Thousand C D ;-). Historically that is the origin of
the "umlaut".  It is definitely distinct from diaeresis,
just as much as Ã¦ is distinct from Ã¤, even in a German context. 
This is not just stretching it very far, I'd say it's plain wrong,
also in a purely German context.  That does not at all prevent a
"hist" feature (or whatever; but never on by default) to do
what amounts to a transient mapping from diaeresis to e-above.

> I don't argue this for caprice or provocation, but because 
> these particular
> shapes are commonly attested in one context or another: be it modern
> typography, traditional typography, handwriting, fancy graphics, etc.

Yes.

...
> >    If (and only if!) the author/editor of the text asks for an
> > overscript e should the font produce one. It is not up to
> > the font maker to make such substitutions without request,
> 
> Yes. But a font which displays U+0308 with a glyph resembling 
> the typical
> glyph for U+0364 is not "producing" anything; it is not "substituting"
> anything with anything else: it is just faithfully 
> reproducing the text,
> according to the content decided by the author *and* according to the
> typographical style decided by the font designer.

This is not a typographic decision, it is a spelling decision,
and not up to the font designer, I'd say.  It is a typographic
decision whether the diaeresis "digs into" the glyph below, or if
an e-above looks like a capital e inside.  But spelling changes,
whether transient or permanent, should be the "author's" call.

                /Kent K

RE: Character identities

Reply via email to