It seems like there is an inconsistency between what the default grapheme clusters specification says and what the test results are expected to be:
The UAX#29 says: > Another key feature (of default Unicode grapheme clusters) is that <b>default > Unicode grapheme clusters are atomic units with respect to the process of > determining the Unicode default line, word, and sentence boundaries</b>. Also this mentioned in UAX#14: > Example 6. Some implementations may wish to tailor the line breaking > algorithm to resolve grapheme clusters according to Unicode Standard Annex > #29, “Unicode Text Segmentation” [UAX29], as a first stage. <b>Generally, the > line breaking algorithm does not create line break opportunities within > default grapheme clusters</b>; therefore such a tailoring would be expected > to produce results that are close to those defined by the default algorithm. > However, if such a tailoring is chosen, characters that are members of line > break class CM but not part of the definition of default grapheme clusters > must still be handled by rules LB9 and LB10, or by some additional tailoring. However, <U+0020 (SP), U+0308 (CM)> in the line breaking algorithm is handled by the rules LB10+LB18 and produces a break opportunity while GB9 prohibits break between <U+0020 (Other), U+0308 (Entend)>. Section 9.2 "Legacy Support for Space Character as Base for Combining Marks" in UAX#29 clarifies why there is a line break occurs, but the fact that the statements above are false statements and introduce some ambiguility. In case the space character is not a grapheme base anymore the grapheme cluster breaking rules need to be updated. Kind regards, Konstantin

