On 07/03/2011 05:52 PM, Mark Davis ☕ wrote:


Mark
/— Il meglio è l’inimico del bene —/


On Sat, Jul 2, 2011 at 14:58, Karl Williamson <[email protected]
<mailto:[email protected]>> wrote:

    I have two questions about this.

    1) In UAX #44, it says for information about the Grapheme_Base
    property, to see UAX #29, but that document doesn't mention this
    property.


The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has
not been used in the specification of grapheme clusters since (I
believe) Unicode 3.2.


    2) The definition in UAX #29 for both legacy and extended grapheme
    clusters effectively says that any Gc=Cn code points followed by any
    number of grapheme_extend code points is a grapheme cluster.  Is
    that what is meant?  I notice that Grapheme_Base excludes Cn code
    points.


It doesn't say that. If you had the sequence <Control Extend>, you'd
have a break between them according to the following rule:
GB4.    ( Control | CR | LF )   ÷       

It would result in two (degenerate) grapheme clusters.

We need to fix the documentation to make this clearer. Could you let me
know what let you to think that "any Gc=Cn code points followed by any
number of grapheme_extend code points is a grapheme cluster" so that we
can clarify that?

It says that an extended grapheme cluster matches this:
( CRLF
| Prepend* ( Hangul-syllable | !Control )
  ( Grapheme_Extend | Spacing_Mark)*
| . )

That tells me that one option for a grapheme cluster is a !Control followed by any number of Grapheme_Extends.

Lower down it defines "Control" as
"General_Category = Line Separator (Zl), or
General_Category = Paragraph Separator (Zp), or
General_Category = Control (Cc), or
General_Category = Format (Cf)
and not U+000D CARRIAGE RETURN (CR)
and not U+000A LINE FEED (LF)
and not U+200C ZERO WIDTH NON-JOINER (ZWNJ)
and not U+200D ZERO WIDTH JOINER (ZWJ)"

By that definition of Control, all Gc=Cn code points are !Control.
Therefore a grapheme cluster can be a Cn followed by any number of Grapheme_Extends

Grapheme_Extend, on the other hand, is exactly equivalent to
Grapheme_Cluster_Break=Extend.



Reply via email to