On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote:
> So I'd say that we should refer to characters in a string, and
deal with
> Unicode code-points in the abstract.
I'm wondering whether 'code points' are any better than UTF-8 based
positioning. Isn't it possible that a codepoint position also points
inside a character/glyph/...? Peter could probably shed some light
on
this.
As in, adding a "C" character at the fifth code-point of "Tronçon"
might give you "TroncÇon", or "TronçCon", depending on whether "ç" is
a "c-with-cedilla" or a "c" followed by a "combining cedilla"?
Yes, I'm quite sure that's possible.
I don't have a solution, either, except to note that this applies to
UTF-8 octets etc as well, unless you normalize all strings first -
but then it's really not clear to me how to translate editing actions
in a GUI into that form.
Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade