MC> Consonants [j] and [w] have the special status of "semivowels" in MC> romance languages, which means that they often behave as vowels MC> do, including in the rules for elision.
One has to differentiate between phonemes and graphemes. Unicode, of course, operates on the grapheme level, and thus you simply can't be certain what a "y" actually stands for (vowel or semivowel) MC> But, of course, I am aware that there are edge cases that will not MC> be captured in the general case. I have named one of these edge MC> cases (the Breton trigraph "c'h"), but it's not difficult to come MC> up with more -- e.g., when the apostrophe is used as a diacritic MC> applied to consonants (such as the Wade-Giles romanization of MC> Chinese "K'ang-hsi"). Just to give another example: Uzbek in Latin script uses "o'" and "g'" as opposed to "o" and "g", such as in the language designation "O'zbek" where "o'" stands for the sound designated in Cyrillic script by U+040E and "g'" is equivalent to U+0493. MC> BTW, notice that I didn't include precomposed accented letters MC> because I understand UTR#29 works on NFD normalized text. Does NFD in this instance mean to include U+0080..00FF, i.e. the former Latin-1 upper block? It would be of interest to us Germans :-) MC> However, "ItalianFrenchVowel" doesn't include Esperanto, Occitan MC> and many Italian and French dialects. "RomanceVowel"? (Not a lot better.) Philipp

