Ant�nio MARTINS-Tuv�lkin (with no diaeresis !) asked: > Anyway, I noted once more that many cyrillic letters I'd consider as > "base letter + diacritical" composites are not decomposable according to > Unicode. I planned to dwell deeper into this, but is there a short > answer for it?
The short answer is that the extended Cyrillic characters in question use diacritics that are mostly various distortions of the base letterforms (the descender ticks and the various hook forms) or involve bars across letter strokes. Long ago it was decided that it would not be a good idea to extend formal character decomposition to such base letterform shape changes or bars across letters. (Note that Latin characters with bars: barred-b, barred-d, barred-i, barred-u, barred-l, and the like are also not decomposed formally. Similarly for Latin letters with hooks, and so on.) So formal canonical decompositions are almost entirely confined to separable, accent-like diacritics (acute, grave, diaeresis, and so on). The only significant exceptions are the cedilla and ogonek, which attach smoothly to letter bottoms without otherwise distorting them, and which often have graphic alternates that are, indeed, separated diacritics (comma-like and reverse-comma-like forms). --Ken

