Re: [Python-Dev] len(chr(i)) = 2?

James Y Knight Tue, 23 Nov 2010 16:24:35 -0800

On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
> Maybe Python should have used UTF-8 as its internal unicode
> representation. Then people who were foolish enough to assume
> one character per string item would have their programs break
> rather soon under only light unicode testing. :-)


You put a smiley, but, in all seriousness, I think that's actually the right 
thing to do if anyone writes a new programming language. It is clearly the 
right thing if you don't have to be concerned with backwards-compatibility: 
nobody really needs to be able to access the Nth codepoint in a string in 
constant time, so there's not really any point in storing a vector of 
codepoints.

Instead, provide bidirectional iterators which can traverse the string by byte, 
codepoint, or by grapheme (that is: the set of combining characters + base 
character that go together, making up one thing which a human would think of as 
a character).

James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

Reply via email to