Aaron Cannon asked:
> Hi all, from the latest version of the standard, on line 16977 of the > normalization tests, I am a bit confused by the NFC form. It appears > incorrect to me. Here's the line, sans comment: > > 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE > 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300 > 0315 0062; > > Just looking at column 2, which according to the comments at the top > is the NFC form: > > 0061 05AE 0305 0300 0315 0062: > > This, however, does not appear to be in NFC form. > > The first character, and the second or third characters do not > compose. However, the first and fourth (0061 and 0300) do, composing > to 00E0. > > Since there are no further compositions, the normalized form should be > 00E0 05AE 0305 0315 0062 > > What am I missing? > Input is: Code points: 0061 0305 0315 0300 05AE 0062 Ccc: 0 230 232 230 228 0 Output of canonical reordering is: Code points: 0061 05AE 0305 0300 0315 0062 Ccc: 0 228 230 230 232 0 Next step is to start from 0061 and test each successive combining mark, looking for composition candidates. 0061 does not compose with 05AE. 0061 does not compose with 0305. 0061 *could* compose with 0300 (00E0 = 0061 + 0300), *but* 0300 is *blocked* from 0061 by the intervening combining mark 0305 with the *same* ccc value as 0300. So the composition does not occur. 0061 does not compose with 0315. The next character is 0062, ccc=0, a starter, so we are done. For the relevant definitions, see: http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G50628 and scroll down a couple pages to D115 on p. 139. Test cases like this are included in NormalizationTest.txt precisely to ensure that implementations are correctly detecting these sequences where composition is blocked. --Ken
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

