From: "John Cowan" <[EMAIL PROTECTED]> > First of all, this is an extended joke. > > The point of the joke is that Czech sorts "ch" as a single letter after > "h", so using a COMBINING C BEFORE would make this happen automatically, > provided the combining character sorted after all letters. > > Spanish also sorts "ch" as a single letter, but after "c", so here we > want a COMBINING H AFTER.
What would Bretons would like to see then for the "c'h" trigraph? a COMBINING APOSTROPHE AFTER, followed by a COMBINING H AFTER, both of them sharing the same canonical combining class or with the COMBINING APOSTROPHE AFTER with a lower combining class than your joke-proposed COMBINING H AFTER? Why not then a COMBINING APOSTROPHE H AFTER ? > Of course, this is really not the way to do language-sensitive collation. It's true that Czech and Spanish do not need such combining character. The question of apostrophes is more difficult, as it is interpreted in some languages either as a punctuation mark or as a combining diacritic part of a digraph or trigraph, for example the APOSTROPHE-N that can occur at the beginning of a word (in Czech too? I can't remember that case), and that causes some headaches when one wants to produce a title-cased word starting by that "sequence" (which really is a digraph, whose title-case folding <'n> is identical to the lowercase folding <'n>), or that may be used in the same language as a quotation mark before a word that should be titlecased independantly. One could resolve the ambiguity by adding a combining apostrophe before, to allow recognizing the digraph <'n> encoded with <LATIN SMALL LETTER N, COMBINING APOSTROPHE BEFORE>, but then this causes problems too when folding a word to titlecase: if the language or this specific digraph is not known or recognized, folding to titlecase may simply look at the first letter of the encoded sequence, so that the first LATIN SMALL LETTER N would be uppercased. Another solution is then to encode a separate apostrophe for use in isolated combining sequences, so that it can be recognized as a plain letter. But then we have to wonder how to do collation, if the apostrophe should be collated with the letter that follows it in the word... So the remaining simple solution is to encode <'n> and <'N> separately as an unbreakable digraph character. If so, why not encoding too the Breton <c'h> and <C'H> (which are trigraphs only if we encode them with the classic Latin alphabet, but not if you look at the definition of the Breton alphabet where they are unbreakable letters, in a case very similar to the <ae>, <AE>, <oe>, and <OE> ligatures considered as plain letters in some languages, and in Unicode, but not in French ???

