Markus Scherer wrote: > Clark Cox wrote: > > According to the comment at the beginning of the file, and > all that I've > > read elsewhere, toNFC(U+1025 U+102E) should result in > U+1026. However > > both U+1025 and U+102E have combining classes of zero, so > my code does > > not compose those characters. No information that I've been > able to find > > has been able to explain this discrepancy. Any help would > be greatly > > appreciated. > > There is no discrepancy. The starter must have ccc==0 but the > second character's ccc can be anything. See Hangul.
This little-known fact (along with the better-known fact that not all non-zero-ccc-characters do take part in existing precomposed characters) has prompted the W3C's Character Model spec to define "composing characters", a concept somewhat distinct from Unicode's combining characters. Appendix C at http://www.w3.org/International/Group/charmod-edit/Overview.html#sec-Composi ngChars contains the definition as well as a list of the characters with ccc=0 that do take part in existing compositions; U+102E is there, of course, as well as the above-mentionned Hangul plus some others. -- Fran�ois

