RE: Case mapping of dotless lowercase letters

Arcane Jill Wed, 17 Dec 2003 07:23:29 -0800

Far be it from me to stir things up even further, but...

QUESTION - Is the rendering of {U+0065} {U+0302} (that's <i, combining circumflex above>) locale-dependent?

I may have got this totally wrong, but it occurs to me that in non-Turkic fonts, U+0065 is "soft-dotted". That is, the dot disappears in the presence of any COMBINING....ABOVE modifier. But in Turkic, U+0065 is "hard-dotted", so the dot must not be removed if a circumflex is added. I freely admit I don't know whether Turkic uses circumflex or not, but the question will work just as well with any COMBINING....ABOVE modifier.

If this is so, how can a character be considered "soft-dotted" in one locale and "hard-dotted" in another?

Would it not make more sense to have not two, but three different kinds of lowercase i: <non-dotted i>, <soft-dotted i> and <hard-dotted i>?. (And similarly for uppercase). Of course, then you might as well invent COMBINING SOFT DOT ABOVE so we can use it elsewhere.

It gets better. (You're gonna hate me). If we then make the set { soft-dotted-i, soft-dotted-I, non-dotted-i, non-dotted-I } a casefold equivalence class which lowercases to <soft-dotted-i> (except in the Turkic locale, where it lowercases to non-dotted-i), and uppercases to <non-dotted-I> in all locales; and if we similarly make { hard-dotted-i, hard-dotted-I } a separate casefold equivalence class lowercasing to <hard-dotted-i> and uppercasing to <hard-dotted-I> (in all locales), then all of the problems outlined by Philippe would go away. And we could do the same with j too.

Of course - it would have one nasty side-effect. The Turks would then have to use <hard-dotted-i> instead of <soft-dotted-i>, but since the characters (in this new scheme) now have completely different meanings, that's fair enough. Hey ho.

Just musing....
Jill

RE: Case mapping of dotless lowercase letters

Reply via email to