On Thu, 05 Sep 2002 13:06:49 +0200 [EMAIL PROTECTED] (Andreas J. Koenig) wrote:
> Hi, Tomoyuki, > > is it a bug in Unicode::Normalize or in my code: I expected that for > combining a circumflex with a small letter i, I'd have to use the > dotless i, but to my surprise, NFC refuses to combine with the dotless > i. Here's a demo progam: > > % perl -le ' > use Unicode::Normalize; > use Encode; > use charnames ":full"; > for my $e (qw(ascii)){ > print Encode::encode($e, > NFKC("combining with i: i\N{COMBINING CIRCUMFLEX ACCENT} > combining with dotless i: \N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING CIRCUMFLEX >ACCENT}"), > Encode::FB_PERLQQ); > } > ' > combining with i: \x{00ee} > combining with dotless i: \x{0131}\x{0302} > > > What do you think? Hello. I have a short and a long answer, respectively. (1) <LATIN SMALL LETTER I WITH CIRCUMFLEX> is not <LATIN SMALL LETTER DOTLESS I WITH CIRCUMFLEX>. (2) Ok, please suppose NFC of <dotless-i, circumflex> is <i-circumflex>. If NFC of a string is equal to NFC of another string, they are called canonical equivalent. Similarly, if NFKC of two strings are equal each other, they are called compatibility equivalent. Then <dotless-i> must be either canonical or compatibility equivalent to <i>, since <i-circumflex> is NFC (or NFKC) of <dotless-i, circumflex> as well as that of <i, circumflex>. In such a case, users of Turkish or other some languages would be disallowed to use them in different senses. Japanese people also use <i-circumflex> in Latin transliteration of Japanese, called ROMAJI, as long "i". (Long "i" is usually represented by "ii" or <i-macron>, though.) If <i-circumflex> might be <dotless-i> with <circumflex>, but not <i> with <circumflex>, <i-circumflex> should be a long sound of <dotless i>, but not long "i". That is also surprising. :) Regards, SADAHIRO Tomoyuki