2009/8/29 <ru...@yahoo.com>: > On 08/28/2009 02:12 AM, "Martin v. Löwis" wrote: > > So far, it seems not and that unichr/ord > is a poster child for "purity beats practicality". > -- > http://mail.python.org/mailman/listinfo/python-list >
As Mark Tolonen pointed out earlier in this thread, in Python 3 the practicality apparently beat purity in this aspect: Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> goth_urus_1 = '\U0001033f' >>> list(goth_urus_1) ['\ud800', '\udf3f'] >>> len(goth_urus_1) 2 >>> ord(goth_urus_1) 66367 >>> goth_urus_2 = chr(66367) >>> len(goth_urus_2) 2 >>> import unicodedata >>> unicodedata.name(goth_urus_1) 'GOTHIC LETTER URUS' >>> goth_urus_3 = unicodedata.lookup("GOTHIC LETTER URUS") >>> goth_urus_4 = "\N{GOTHIC LETTER URUS}" >>> goth_urus_1 == goth_urus_2 == goth_urus_3 == goth_urus_4 True >>> As for the behaviour in python 2.x, it's probably good enough, that the surrogates aren't prohibited and the eventually needed behaviour can be easily added via custom functions. vbr -- http://mail.python.org/mailman/listinfo/python-list