Mark
*— Il meglio è l’inimico del bene —*

On Sat, Jul 2, 2011 at 14:58, Karl Williamson <[email protected]>wrote:

> I have two questions about this.
>
> 1) In UAX #44, it says for information about the Grapheme_Base property, to
> see UAX #29, but that document doesn't mention this property.
>

The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has not
been used in the specification of grapheme clusters since (I believe)
Unicode 3.2.


>
> 2) The definition in UAX #29 for both legacy and extended grapheme clusters
> effectively says that any Gc=Cn code points followed by any number of
> grapheme_extend code points is a grapheme cluster.  Is that what is meant?
>  I notice that Grapheme_Base excludes Cn code points.
>

It doesn't say that. If you had the sequence <Control Extend>, you'd have a
break between them according to the following rule:
GB4.( Control | CR | LF )÷
It would result in two (degenerate) grapheme clusters.

We need to fix the documentation to make this clearer. Could you let me know
what let you to think that "any Gc=Cn code points followed by any number of
grapheme_extend code points is a grapheme cluster" so that we can clarify
that?

Grapheme_Extend, on the other hand, is exactly equivalent to
Grapheme_Cluster_Break=Extend.

Reply via email to