> Is this actually correct? For example, if I have in my data the string > <U+0104, U+05B0> (which I know is garbage, but that is irrelevant), that > > will decompose and reorder to <U+0041, U+05B0, U+0328>, as U+05B0 has a > > higher combining class (202) than U+05B0 (10). What does this become in > NFC? Is the reordering reversed and the combination reapplied?
First an attempt is made to compose U+0041 and U+05B0. There is no character allowing for this, so that attempt will fail. Then an attempt is made to compose U+0041 and U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed resulting in <U+0104, U+05B0>. It's not a reordering per se, as the first combining character is given the first "opportunity" to combine. > This is not only a theoretical issue as the same applies to some real > combinations. There was discussion only last week on the bidi list of a > form which might be encoded <U+064A, U+0652, U+0654> but which would be > > messed up if composed into <U+0626, U+0652>. Yes, NFC would perform that composition. Are you sure it would be an issue? Applying bidi rules doesn't seem to make this an issue. <U+064A, U+0652, U+0654> bidi: Al, NSM, NSM applying rule W1 from USA9: Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al. <U+0626, U+0652> bidi: Al, NSM applying rule W1: Al, NSM -> Al, Al Or is the issue with something else, but it came up on the bidi list?

