On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote:
> So I'd say that we should refer to characters in a string, and deal with
> Unicode code-points in the abstract.

I'm wondering whether 'code points' are any better than UTF-8 based
positioning. Isn't it possible that a codepoint position also points
inside a character/glyph/...? Peter could probably shed some light on
this.


As in, adding a "C" character at the fifth code-point of "Tronçon" might give you "TroncÇon", or "TronçCon", depending on whether "ç" is a "c-with-cedilla" or a "c" followed by a "combining cedilla"?

Yes, I'm quite sure that's possible.

I don't have a solution, either, except to note that this applies to UTF-8 octets etc as well, unless you normalize all strings first - but then it's really not clear to me how to translate editing actions in a GUI into that form.

Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply via email to