On 25/09/2003 12:27, [EMAIL PROTECTED] wrote:
Is this actually correct? For example, if I have in my data the string
<U+0104, U+05B0> (which I know is garbage, but that is irrelevant), that
will decompose and reorder to <U+0041, U+05B0, U+0328>, as U+05B0 has a
higher combining class (202) than U+05B0 (10). What does this become in
NFC? Is the reordering reversed and the combination reapplied?
First an attempt is made to compose U+0041 and U+05B0. There is no character allowing for this, so that attempt will fail. Then an attempt is made to compose U+0041 and U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed resulting in <U+0104, U+05B0>.
It's not a reordering per se, as the first combining character is given the first "opportunity" to combine.
Thanks for the clarification.
This is not only a theoretical issue as the same applies to some real
combinations. There was discussion only last week on the bidi list of a
form which might be encoded <U+064A, U+0652, U+0654> but which would be
messed up if composed into <U+0626, U+0652>.
Yes, NFC would perform that composition. Are you sure it would be an issue?
Applying bidi rules doesn't seem to make this an issue.
<U+064A, U+0652, U+0654>
bidi: Al, NSM, NSM
applying rule W1 from USA9:
Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al.
<U+0626, U+0652>
bidi: Al, NSM
applying rule W1:
Al, NSM -> Al, Al
Or is the issue with something else, but it came up on the bidi list?
The problem isn't with the bidi rules but with more general Arabic
shaping etc. There are two issues, one the position of the hamza (in
this case it should be to the left of the sukun) and the other that the
medial form of U+064A has dots below, which are required in this
combination, but the medial form of U+0626 does not. But I think we
concluded that U+0654 alone is not suitable for encoding this particular
hamza.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/