RE: Case mapping of dotless lowercase letters

Philippe Verdy Wed, 17 Dec 2003 05:07:13 -0800

Peter Kirk wrote:
> This implies (since there are no decomposition exclusions) that NFD, 
> used on Turkic text, violates the very sensible rule DO NOT USE 
> COMBINING DOTS WITH I's, and leads to all sorts of potential confusion 
> e.g. that both simple and full case folding and lowercasing applied to 
> NFD Turkic text generate the nonsensical <i, dot above>. This could be a 
> serious problem - although one that may not be worth fixing.


Yes NFD is an issue, but not a critical one, because the decomposition is
canonical, and not excluded from recomposition.

However you're wrong here: only Full CaseFolding generates <i, dot-above>
from <dotted-I>, not the default lowercase mapping in the UCD which is just
left unchanged, or the locale-specific "tr"/"az" lowercase mapping which
maps it to <(soft-dotted-)i>.

Typical Turkish and Azeri texts will not use <dot-above>, except in the NFD
form <I, dot-above> for <dotted-I>, which is just needed because of the Full
CaseFolding mapping to make it respect canonical equivalence.

I do hope that dotless-j and dotted-J will avoid these confusions, but not
trying to decompose dotted-J in the NFD form, and not generating <j,
dot-above> in Full CaseFolding of <dotted-J>, but just <(soft-dotted-)j>. Or
will it add more confusion there, if j is treated diffrently than i?


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

RE: Case mapping of dotless lowercase letters

Reply via email to