Peter Kirk wrote: > This implies (since there are no decomposition exclusions) that NFD, > used on Turkic text, violates the very sensible rule DO NOT USE > COMBINING DOTS WITH I's, and leads to all sorts of potential confusion > e.g. that both simple and full case folding and lowercasing applied to > NFD Turkic text generate the nonsensical <i, dot above>. This could be a > serious problem - although one that may not be worth fixing.
Yes NFD is an issue, but not a critical one, because the decomposition is canonical, and not excluded from recomposition. However you're wrong here: only Full CaseFolding generates <i, dot-above> from <dotted-I>, not the default lowercase mapping in the UCD which is just left unchanged, or the locale-specific "tr"/"az" lowercase mapping which maps it to <(soft-dotted-)i>. Typical Turkish and Azeri texts will not use <dot-above>, except in the NFD form <I, dot-above> for <dotted-I>, which is just needed because of the Full CaseFolding mapping to make it respect canonical equivalence. I do hope that dotless-j and dotted-J will avoid these confusions, but not trying to decompose dotted-J in the NFD form, and not generating <j, dot-above> in Full CaseFolding of <dotted-J>, but just <(soft-dotted-)j>. Or will it add more confusion there, if j is treated diffrently than i? __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

