Re: RE: Character identities

Jim Allan Tue, 29 Oct 2002 18:47:37 -0800

The Old Icelandic character ǫ (Unicode U+01ED: LATIN SMALL LETTER O WITH OGONEK) is replaced in modern Icelandic by ö.

Would it be proper therefore to represent U+00F6, the code point which Marco Cimarosti wants to use for o with circumflex e, also for o with ogonek?

In Icelandic they could be called the same character. Of course that only works of Icelandic. We could not use this font for German or English or French, unless we build some kind of recognition of language tags into it.

In French the circumflex accent indicates an earlier superscript s over the vowel. So should we allow combining superscript s as a variant glyph for the circumflex? But what of French text containing transliterated Arabic names or Welsh names or transliterated classical Greek names which use a circumflex which never had such a meaning? Again we would need language tagging.

The Old English and Middle English letter thorn (þ)is replaced in Modern English by the combination th. Would it make sense then for a modern font to represent U+00FE by a glyph showing th? Would it also make sense to replace the kinds of glyphs used for U+204A TIRONIAN SIGN ET with an ampersand? The meaning is exactly the same. But what if we want to used this font for Icelandic or Old English? Do we again need an intelligent font that understands language tagging?

Do we now have different flavors of Unicocde, one for English, one for Icelandic, one for French, one for German ... ? What of other languages?

A diaeresis used in the transliterated Classical names Peirithoüs and Menelaüs is not the same as a superscript e, though in German (and some other languages) sounds once indicated by supersript e over a vowel have been replaced by diaeresis over a vowel. If so, then a font which rendered any dieresis over u or o or a would be incorrect for classical names cited and also possibly for other foreign names. How would J.R.R. Tolkien's name Eärendil be rendered by such a font where the diaeresis indicates separate pronunciation of a, not an umlauted a?

Surely it makes more sense that an author or advertising designer who wishes to use u with superscript e to use the Unicode method of u followed by a combining superscript e so that it will appear as desired in any font rather than by using a font change? Font changes should not change the orthography or spelling of the original but should represent transparently what the writer intended, and Unicode gives us a clear way to distinguish combining superscript e from combining diaeresis and combining superscript s from combining circumflex.

Using the Unicode method makes far more sense than creating fonts that work for particular languages only, provided no foreign words or names appear, or which require language tagging.

In most European languages æ and œ are ligatures at one time commonly used in names and technical words of Latin origin. Modern stylistic preference is to avoid these ligatures. However French uses œ for a particular sound, though the use of that ligature instead of oe was not considered important enough for œ to be generally available on French typewriters. Also both diagraphs were separate letters in Old English, whence the use of æ still in modern Danish and Icelandic. Should this modern convention be properly indicated in an intelligent font by using unconnected ae and oe for the these digraphs except where language tagging indicates Danish, Icelandic, or older Scandinavian use or Old English? Should we have to language tag Encyclopædia Britannica to be sure that æ appears in the name properly connected?

In fact, the stylistic conventions are indicated not by font changes or tagging but by typing the appropriate characters.

Should an English language font render ö as oe, so that Göthe appears automatically in the more normal English form Goethe?

Marco's desire to use a font to indicate combining superscript einstead of the way Unicode wants it done seems prompted because currently most Unicode fonts do not currently support the combinining superscript characters and he wishes a fallback to normal diaeresis instead of to an undefined character indicator.

This is a reasonable wish.

In light of current Unicode support, the hack of identifying diaeresis with combinining superscript e makes sense.

There has never been anything wrong with using a hack when required for a task at hand. But hacks of this kind that, if followed up widely in many fonts in many languages, would produce a chaos of interpretations and numerous fonts only suited for particular languages, filtering the text and not presenting what is there, without complex and otherwise unnecessary tagging.

Surely this is not what Unicode should be?

If a writer uses a long s in modern writing, whether quoting text of an earlier era or purposely being archaic, normal fonts should display a long s, not a short s on the grounds that it happens long s is not normally used in modern writing in Antiqua fonts.

If a writer decides between using ü, ue, or uͤͤ (u with combining superscript e), the font should leave the text alone.

If you have a newer version of the Code 2000 font on your machine which contains the combining superscripts, then the superscript eappears correctly in newer browsers, even if you are using a different font for the base character. A diacritic from one font is placed over the base character of another.

I can understand Marco not wishing to bother viewers with the demand to load a particular font and also knowing that dynamic downloading of a font will not work with every system or browser or with user settings of browsers. So use the hack for now. In two or three years, hopefully, it will not be necessary.

Generally a font should not be correcting the text.

The use of macron for dieresis is somewhat a different matter. If a particular style of German script uses a line for a diaeresis, then indeed the diaeresis in that script has fallen together in appearance with the macron. This would be especially so if a diaeresis was used over e and i (in foreign words and names). Representing diaeresis by a glyph of macron form would be no more of a hack then would be the use in an English script font of a p with an ascender, though presumably an Icelander would identify that as the letter þ, not p. (How þ itself should be presented in such a script font is problematical!)

The main difficulty with identification of diaeresis and combinining superscript e is that the identification does not work universally, even within German, if foreign names or words appear. Even in German text, combining superscript e may not always correctly replace diaeresis.

Jim Allan

Re: RE: Character identities

Reply via email to