On May 18, 2009, at 09:21 , Mark J. Reed wrote:
If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter.  I'm


I would argue that if you are working with a grapheme cluster ("grapheme"), arithmetic on individual grapheme values is undefined. What is the meaning of ord(\c[LATIN LETTER T WITH DOT ABOVE, COMBINING DOT BELOW]) + 1? If you say it increments the base character (a reasonable-looking initial stance), what happens if I add an amount which changes the base character to a combining character? And what happens if the original grapheme doesn't have a base character?

In short, I think the only remotely sane result of ord() on a grapheme is an opaque value meaningful to chr() but to very little, if anything, else. If you want to represent it as an integer, fine, but it should be obscured such that math isn't possible on it. Conversely, if you want ord() values you can manipulate, you must work at the codepoint level.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH


Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to