Markus Scherer wrote:
> Clark Cox wrote:
> > According to the comment at the beginning of the file, and 
> all that I've 
> > read elsewhere, toNFC(U+1025 U+102E) should result in 
> U+1026. However 
> > both U+1025 and U+102E have combining classes of zero, so 
> my code does 
> > not compose those characters. No information that I've been 
> able to find 
> > has been able to explain this discrepancy. Any help would 
> be greatly 
> > appreciated.
> 
> There is no discrepancy. The starter must have ccc==0 but the 
> second character's ccc can be anything. See Hangul.

This little-known fact (along with the better-known fact that not all
non-zero-ccc-characters do take part in existing precomposed characters) has
prompted the W3C's Character Model spec to define "composing characters", a
concept somewhat distinct from Unicode's combining characters.  Appendix C
at
http://www.w3.org/International/Group/charmod-edit/Overview.html#sec-Composi
ngChars contains the definition as well as a list of the characters with
ccc=0 that do take part in existing compositions; U+102E is there, of
course, as well as the above-mentionned Hangul plus some others.

-- 
Fran�ois

Reply via email to