Re: [Python-Dev] len(chr(i)) = 2?

Victor Stinner Thu, 25 Nov 2010 13:41:01 -0800

On Friday 19 November 2010 23:25:03 you wrote:
> > Python is unclear about non-BMP characters: narrow build was called
> > "ucs2" for long time, even if it is UTF-16 (each character is encoded to
> > one or two UTF-16 words).
> 
> No, no, no :-)
> 
> UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even
> "UTF-16" and "UTF-32".


Ok for Python 2:

$ ./python 
Python 2.7.0+ (release27-maint:84618M, Sep  8 2010, 12:43:49) 
>>> import sys; sys.maxunicode
65535
>>> x=u'\U0010ffff'; len(x)
2
>>> ord(x)
...
TypeError: ord() expected a character, but string of length 2 found


But Python 3 does use UTF-16 for narrow build:

$ ./python                                                                      
                                                            
Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09)                        
                                                           
>>> import sys; sys.maxunicode
65535
>>> c=chr(0x10ffff); len(c)
2
>>> ord(c)
1114111

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

Reply via email to