On Friday 19 November 2010 23:25:03 you wrote: > > Python is unclear about non-BMP characters: narrow build was called > > "ucs2" for long time, even if it is UTF-16 (each character is encoded to > > one or two UTF-16 words). > > No, no, no :-) > > UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even > "UTF-16" and "UTF-32".
Ok for Python 2: $ ./python Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49) >>> import sys; sys.maxunicode 65535 >>> x=u'\U0010ffff'; len(x) 2 >>> ord(x) ... TypeError: ord() expected a character, but string of length 2 found But Python 3 does use UTF-16 for narrow build: $ ./python Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09) >>> import sys; sys.maxunicode 65535 >>> c=chr(0x10ffff); len(c) 2 >>> ord(c) 1114111 Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com