On 24.06.2011 11:24, Remko Tronçon wrote: >> So I'd say that we should refer to characters in a string, and deal with >> Unicode code-points in the abstract. > > I'm wondering whether 'code points' are any better than UTF-8 based > positioning. Isn't it possible that a codepoint position also points > inside a character/glyph/...? Peter could probably shed some light on > this. > FWIW, I think using codepoints solves somewhat different problem.
If we count codepoints we can delete "half a character", e.g. remove the "combining cedilla" from ç, but if we count UTF-(8,16) based we can delete "half a codepoint" rendering the result undecodeable which is far worse.
