Terry J. Reedy <tjre...@udel.edu> added the comment:

Printing the unquoted escape representation rather than a replacement char is a 
bit strange and not what I expect from the python docs.  I could see it as a 
bug.  In any case, on Windows, it is the Python REPL that raises, but only for 
sys.stdout.

>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 
0: surrogates not allowed

whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] 
compressed in one char) in both cases.  When copied and pasted into FireFox, 
the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�

I consider putting the undisplayable codepoint, rather than a replacement 
character, into the editor buffer (however tcl encodes it) so that IDLE can 
retrieve it without loss of information the proper thing for tk to do. IDLE can 
then potentially identify the character to the user.
===

An oddity though.  With

>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')

the box is an empty square, not crossed.  But when I copy-paste 'a�b' into the 
font sample (Serhiy, making this editable was a great idea), it is crossed for 
every font I tried, even for Courier, which is what is being used in text t.

----------
stage:  -> needs patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue22742>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to