On 07/03/2011 05:52 PM, Mark Davis ☕ wrote:
Mark
/— Il meglio è l’inimico del bene —/
On Sat, Jul 2, 2011 at 14:58, Karl Williamson <[email protected]
<mailto:[email protected]>> wrote:
I have two questions about this.
1) In UAX #44, it says for information about the Grapheme_Base
property, to see UAX #29, but that document doesn't mention this
property.
The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has
not been used in the specification of grapheme clusters since (I
believe) Unicode 3.2.
2) The definition in UAX #29 for both legacy and extended grapheme
clusters effectively says that any Gc=Cn code points followed by any
number of grapheme_extend code points is a grapheme cluster. Is
that what is meant? I notice that Grapheme_Base excludes Cn code
points.
It doesn't say that. If you had the sequence <Control Extend>, you'd
have a break between them according to the following rule:
GB4. ( Control | CR | LF ) ÷
It would result in two (degenerate) grapheme clusters.
We need to fix the documentation to make this clearer. Could you let me
know what let you to think that "any Gc=Cn code points followed by any
number of grapheme_extend code points is a grapheme cluster" so that we
can clarify that?
It says that an extended grapheme cluster matches this:
( CRLF
| Prepend* ( Hangul-syllable | !Control )
( Grapheme_Extend | Spacing_Mark)*
| . )
That tells me that one option for a grapheme cluster is a !Control
followed by any number of Grapheme_Extends.
Lower down it defines "Control" as
"General_Category = Line Separator (Zl), or
General_Category = Paragraph Separator (Zp), or
General_Category = Control (Cc), or
General_Category = Format (Cf)
and not U+000D CARRIAGE RETURN (CR)
and not U+000A LINE FEED (LF)
and not U+200C ZERO WIDTH NON-JOINER (ZWNJ)
and not U+200D ZERO WIDTH JOINER (ZWJ)"
By that definition of Control, all Gc=Cn code points are !Control.
Therefore a grapheme cluster can be a Cn followed by any number of
Grapheme_Extends
Grapheme_Extend, on the other hand, is exactly equivalent to
Grapheme_Cluster_Break=Extend.