RE: Case mapping of dotless lowercase letters

Michael Everson Tue, 16 Dec 2003 10:13:06 -0800

At 16:48 +0100 2003-12-16, Philippe Verdy wrote:

Michael Everson wrote:

 At 11:03 +0100 2003-12-16, Philippe Verdy wrote:
 >Doug Ewell <[EMAIL PROTECTED]> writes:
 > > > Wrong here: I have found occurences of dotless lowercase i, used
 > > > instead of soft-dotted lowercase i, as base letters for diacritics
 > > > added above it (it was an accute accent...)
 > >
 > > Don't do that.
 >
 >What? This is VALID UNICODE to have texts coded like this.

 In Irish, it is INCORRECT to spell "f�se�n"
 'video' with a DOTLESS I + COMBINING ACUTE. It is
 a spelling error, and will fail in
 spell-checking. The correct spelling is either I
 + COMBINING ACUTE or precomposed I WITH ACUTE.

Spelling was not the issue there. Only Unicode validity.

Apparently you should look up the word "valid".

Any character can follow any other character and be "valid". Any combining character can be applied to any base character, regardless of script.

> Texts which contain spelling errors. Or old IPA
 texts using any number of ad-hoc IPA font
 solutions. Those texts have to be transcoded to
 proper Unicode at some stage. What you suggest is
 Not Recommended.
Not recommanded but still valid (and actually used in Turkish as well!)

Case folding in Turkish and Azeri is DIFFERENT from everywhere else and you have to have a local tailoring for it.

used in some occasions because of defects in fonts that don't have a
precomposed glyph for letter i with the diacritic but have a separate glyph
for the combining diacritic and for the dotted and dotless letters i, or
that use renderers unable to remove the soft dot.

What defects there are in FONTS without UNICODE CMAPS is of no concern to us.

The IPA-93 font is such one, which allows good typesetting, but which needs glyph processing to select the appropriate base letter.

It isn't a Unicode font, and so it doesn't matter. Data represented in it has to be transcoded to Unicode, and the font has to have the right thing in it.

My main issue is, however with Turkish names found in environments where
language identification is not possible (for example a simple filename or a
locale-neutral database field or an international HTML form which requests
user names and use them as case insensitive identifiers); lowercase dotless
i do not work appropriately there.

Oh well.

I think it is completely illogical to match together with case-insensitive
compares, the three letters:
        LATIN SMALL LETTER I (dotted)
        LATIN CAPITAL LETTER I (dotless)
        LATIN CAPITAL LETTER I WITH DOT ABOVE
but not:
        LATIN SMALL LETTER DOTLESS I
when use locale-neutral compares, given that the normative uppercase mapping
of this fourth letter is the second letter above.


That is not what happens in locale-neutral comparisons, I believe.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Case mapping of dotless lowercase letters

Reply via email to