On Sun, 4 Oct 2015 21:48:12 +0200 Philippe Verdy <[email protected]> wrote:
> 2015-10-04 21:30 GMT+02:00 Richard Wordingham < > [email protected]>: > > On Sun, 4 Oct 2015 15:44:32 +0200 > > Mark Davis ☕️ <[email protected]> wrote: > > > When I use http://unicode.org/cldr/utility/breaks.jsp, it does > > > show the sequence 𑒏�𑒺 as just two grapheme clusters. > > But that's the sequence <U+1148F, U+FFFD, U+114BA>, which has no > > lone surrogates at all! > Mark just said that it was what was shown, i.e. the lone surrogate got > treated as U+FFFD. That's not what the English says, and I'm surprised if that's what a literal translation into French means. I do half suspect that he actually tried to post a lone surrogate. > However my opinion is that 𑒏�𑒺 (using U+FFFD substitution) gives 2 > grapheme clusters, I would prefer a solution that gives 3 grapheme > clusters, as if the lone surrogate was a line-break control, so that > the third character (combining, but just after the lone surrogate) > will not combine with it but will be handled as a defective combining > sequence with no starter at all before it. I'd much prefer to be able to delete the first character of a grapheme cluster. It's annoying to have to retype 4 characters because one's mistyped the first of the 4 characters in a grapheme cluster. Removing the restriction would be much more useful. Richard.

