Re: [Standards] RTT, take 2

Dave Cridland Fri, 24 Jun 2011 02:38:50 -0700

On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote:

> So I'd say that we should refer to characters in a string, anddeal with
> Unicode code-points in the abstract.
I'm wondering whether 'code points' are any better than UTF-8 based
positioning. Isn't it possible that a codepoint position also points
inside a character/glyph/...? Peter could probably shed some lighton
this.

As in, adding a "C" character at the fifth code-point of "Tronçon"might give you "TroncÇon", or "TronçCon", depending on whether "ç" isa "c-with-cedilla" or a "c" followed by a "combining cedilla"?


Yes, I'm quite sure that's possible.

I don't have a solution, either, except to note that this applies toUTF-8 octets etc as well, unless you normalize all strings first -but then it's really not clear to me how to translate editing actionsin a GUI into that form.


Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Re: [Standards] RTT, take 2

Reply via email to