Quoting Peter Kirk <[EMAIL PROTECTED]>: [snip me quoting D17a] > > > >"in some way defective" is actually a good way to put it methinks, they > aren't > >illegal, and in some cases you can do things with them that are both > reasonable > >and useful, but in other situations they may be problematic. > > > > > > > > > Indeed. But I was thinking more in terms of grapheme clusters, as > defined in UAX #29. Is a defective combining sequence a grapheme > cluster? Probably not according to the definition "what the user thinks > of as a character or basic unit of the language". But the boundary rule > "/Break at the start and end of text./" implies that the algorithm will > count a defective combining sequence at the start of text (and possibly > what follows) as a default grapheme cluster. So it is "in some way > defective" as a grapheme cluster as well as as a character sequence.
My understanding is that it would be counted, but I agree it doesn't match "what the user thinks of as a character" very well. So it's a grapheme cluster, but it's "in some way defective" :) > I note the following in UAX #29, which backs up my comments on functions > to count characters: > > > In those rare circumstances where end-users need character counts, the > > counts should correspond to the grapheme cluster boundaries. > > This implies that end users should not require counts of code units or > code points. I don't think anyone argued against this being what *end* users require. Certainly for small values of "end" anyway. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

