Indeed. But I was thinking more in terms of grapheme clusters, as defined in UAX #29. Is a defective combining sequence a grapheme cluster? Probably not according to the definition "what the user thinks of as a character or basic unit of the language". But the boundary rule "/Break at the start and end of text./" implies that the algorithm will count a defective combining sequence at the start of text (and possibly what follows) as a default grapheme cluster. So it is "in some way defective" as a grapheme cluster as well as as a character sequence.Thank you. I was supposing that isolated combining marks were considered in some way defective,
<blockquote cite="http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf">
D17a: Defective combining character sequence: A combining character sequence that does not start with a base character.
[Explanatory Note] Defective combining character sequences occur when a sequence of combining
characters appears at the start of a string or follows a control or format character.
Such sequences are defective from the point of view of handling of combining
marks, but are not ill-formed.
</blockquote>
"in some way defective" is actually a good way to put it methinks, they aren't illegal, and in some cases you can do things with them that are both reasonable and useful, but in other situations they may be problematic.
I note the following in UAX #29, which backs up my comments on functions to count characters:
In those rare circumstances where end-users need character counts, the counts should correspond to the grapheme cluster boundaries.
This implies that end users should not require counts of code units or code points.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

