In an off-list conversation, John Klensin pointed out to me that there could be confusion about the definition of the HasCompat() category from Section 9.17 of RFC 7564 and of draft-ietf-precis-7564bis-04.

I can't speak for my co-author Marc Blanchet, but I've always considered HasCompat to apply in a "unidirectional" way to the input characters. For instance, if we have three code points P0, P1, and P2 such that NFKC(P1P2) = P0P0, then the HasCompat() category is assigned to P1 and P2 but not to P0. That is, P1 and P2 are decomposed and then recomposed in a lossy way because we can't tell from the output string P0P0 what the input string was, and there is way to determine all the characters that could be decomposed and recomposed into P0P0. It seems that the current text might be a bit confusing (as I understand what John wrote, the term "has a compatibility equivalent" could be taken to apply to P0 in this example), so I will try to make it clearer.

Furthermore, John pointed out that the HasCompat() categorization for a given input string could potentially change across Unicode versions (e.g., if the input string includes a precomposed character that was added in a recent version of Unicode). Although I'm not sure if this is unavoidable, it does seem that we need to at least mention the potential instability of this category.

Peter

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to