Quoting Peter Kirk <[EMAIL PROTECTED]>:

[snip me quoting D17a]
> >
> >"in some way defective" is actually a good way to put it methinks, they
> aren't 
> >illegal, and in some cases you can do things with them that are both
> reasonable 
> >and useful, but in other situations they may be problematic.
> >
> >
> >  
> >
> Indeed. But I was thinking more in terms of grapheme clusters, as 
> defined in UAX #29. Is a defective combining sequence a grapheme 
> cluster? Probably not according to the definition "what the user thinks 
> of as a character or basic unit of the language". But the boundary rule 
> "/Break at the start and end of text./" implies that the algorithm will 
> count a defective combining sequence at the start of text  (and possibly 
> what follows) as a default grapheme cluster. So it is "in some way 
> defective" as a grapheme cluster as well as as a character sequence.

My understanding is that it would be counted, but I agree it doesn't 
match "what the user thinks of as a character" very well. So it's a grapheme 
cluster, but it's "in some way defective" :)

> I note the following in UAX #29, which backs up my comments on functions 
> to count characters:
> 
> > In those rare circumstances where end-users need character counts, the 
> > counts should correspond to the grapheme cluster boundaries.
> 
> This implies that end users should not require counts of code units or 
> code points.

I don't think anyone argued against this being what *end* users require. 
Certainly for small values of "end" anyway.

--
Jon Hanna                   | Toys and books
<http://www.hackcraft.net/> | for hospitals:
                            | <http://santa.boards.ie>

Reply via email to