On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: > Maybe Python should have used UTF-8 as its internal unicode > representation. Then people who were foolish enough to assume > one character per string item would have their programs break > rather soon under only light unicode testing. :-)
You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com