On 1/7/2013 8:12 AM, Terry Reedy wrote:
On 1/7/2013 7:57 AM, Franck Ditter wrote:

<<< print('\U0001d11e')
Traceback (most recent call last):
   File "<pyshell#1>", line 1, in <module>
     print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk

The message comes from printing to a tk text widget (the IDLE shell),
not from creating the 1 char string. c = '\U0001d11e' works fine. When
you have problems with creating and printing unicode, *separate*
creating from printing to see where the problem is. (I do not know if
the brand new tcl/tk 8.6 is any better.)

The windows console also chokes, but with a different message.

 >>> c='\U0001d11e'
 >>> print(c)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
in posit
ion 0: character maps to <undefined>

Yes, this is very annoying, especially in Win 7.

The above is in 3.3, in which '\U0001d11e' is actually translated to a length 1 string. In 3.2-, that literal is translated (on 3.2- narrow builds, as on Windows) to a length 2 string surrogate pair (in the BMP). On printing, the pair of surrogates got translated to a square box used for all characters for which the font does not have a glyph. ๐„žWhen cut and pasted, it shows in this mail composer as a weird music sign with peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
---   ๐„ž   ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on printing.

--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to