Mark *— Il meglio è l’inimico del bene —*
On Sat, Jul 2, 2011 at 14:58, Karl Williamson <[email protected]>wrote: > I have two questions about this. > > 1) In UAX #44, it says for information about the Grapheme_Base property, to > see UAX #29, but that document doesn't mention this property. > The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has not been used in the specification of grapheme clusters since (I believe) Unicode 3.2. > > 2) The definition in UAX #29 for both legacy and extended grapheme clusters > effectively says that any Gc=Cn code points followed by any number of > grapheme_extend code points is a grapheme cluster. Is that what is meant? > I notice that Grapheme_Base excludes Cn code points. > It doesn't say that. If you had the sequence <Control Extend>, you'd have a break between them according to the following rule: GB4.( Control | CR | LF )÷ It would result in two (degenerate) grapheme clusters. We need to fix the documentation to make this clearer. Could you let me know what let you to think that "any Gc=Cn code points followed by any number of grapheme_extend code points is a grapheme cluster" so that we can clarify that? Grapheme_Extend, on the other hand, is exactly equivalent to Grapheme_Cluster_Break=Extend.

