On 1/7/2013 8:12 AM, Terry Reedy wrote:
On 1/7/2013 7:57 AM, Franck Ditter wrote:
<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk
The message comes from printing to a tk text widget (the IDLE shell),
not from creating the 1 char string. c = '\U0001d11e' works fine. When
you have problems with creating and printing unicode, *separate*
creating from printing to see where the problem is. (I do not know if
the brand new tcl/tk 8.6 is any better.)
The windows console also chokes, but with a different message.
>>> c='\U0001d11e'
>>> print(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
in posit
ion 0: character maps to <undefined>
Yes, this is very annoying, especially in Win 7.
The above is in 3.3, in which '\U0001d11e' is actually translated to a
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow
builds, as on Windows) to a length 2 string surrogate pair (in the BMP).
On printing, the pair of surrogates got translated to a square box used
for all characters for which the font does not have a glyph. ๐When cut
and pasted, it shows in this mail composer as a weird music sign with
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
--- ๐ ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on
printing.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list