[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-31 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

Closing again. Now IDLE works fine both in subprocess and inprocess mode. 

Future support of non-BMP can be continues after implementing codec for that — 
#14304

Now I like to close that as «good enough for now».
At least IDLE doesn't crashed on printing anything.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-25 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 89878808f4ce by Andrew Svetlov in branch 'default':
Issue #14200 — now displayhook for IDLE works in non-subprocess mode as well as 
subprecess.
http://hg.python.org/cpython/rev/89878808f4ce

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-25 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

After experiments with non-BMP characters I figured out:
— non-bmp symbols processed by Tk text widgets (Entry, Text etc.) differently. 
For example Entry can display non-bmp with spaces after glyph, Text reduces 
symbol to BMP. Editing is also weird.
— looks like tk event loop passes input of non-bmp directly to tkinter as is.

Obviously Tk does not support non-BMP chars by spec while not rejects ones 
strictly. Details are implementation specific and depends not only from Tcl/Tk 
version but from concrete widget class. 

After that my position is: 
— implement utf8-bmp codec
— first implementation of utf8-bmp can be done with pure python using utf-8 
codec and checks. This way is simple enough while has potential performance 
degradation. Doesn't matter if codec will be used only for converting relative 
short strings in Tk widgets.
— use it in _tkinter AsObj/FromObj functions with 'replace' mode.
— my approach is a bit incompatible in dark corner matter of non-BMP chars (not 
supported but silently passed to low-level platform API with weird transitions 
on the way). I think this is not a problem at all. 
— with utf-8-bmp codec IDLE still can use 'strict' mode in .write function 
(`print` and displayhook I mean) to keep current behavior or use escaping for 
displayhook and 'replace' for regular `print`. In implementation of #14326 we 
can use directly specified encoding for `print` as well.

I experimented with Ubuntu box but pretty sure — the same result can be 
reproduced on OS X and Windows as well. Also we need to make Tk to be 
crossplatform — so replacing non-BMP is not bad but it is good solution until 
Tcl/Tk will process non-bmp in native manner.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 On Windows Vista, I do see that print() behaves differently than
 evaluating the expression. An exception is raised for:
 print('\N{GOTHIC LETTER AHSA}')

As is for most other characters not supported in your OEM code
page, e.g. (likely) '\N{GREEK SMALL LETTER ALPHA}'

 On Linux, I see the character print as ? in xterm and as a '?' when
 evaluated. In gnome-terminal (Ubuntu Mono font) it prints as a box
 containing the code point in hex. No exception is raised.

That's because your terminal output encoding is UTF-8. If you change
your locale to C, or any other locale that doesn't cover full Unicode
(e.g. de_DE.ISO-8859-1, if supported on your Linux installation),
you get the same behavior on Linux as you do on Windows.

 Given that Windows and Linux (Ubuntu) behave differently

That's not a given, see above.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

I stand corrected. Thank you for the information.

The behavior of the console depends on its locale. IDLE has no facility for 
changing the locale of the PyShell window. Should this option be included 
somewhere?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

I think that doesn't make sense.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

The Tkinter Text widget is the output for the IDLE shell and it has the 
limitation imposed by Tcl/Tk of not handling non-BMP unicode characters. 

Is the following reasonable: The IDLE shell console has a locale of non-BMP 
utf8?

If so, would it be reasonable to add a menu item to switch locales for the 
shell? This amounts to adding some extra code to OutputWindow's write() to 
raise encoding errors if the string contains unsupported characters, and 
possibly replacing characters to work around Tcl/Tk's non-BMP limitation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 The behavior of the console depends on its locale. IDLE has no
 facility for changing the locale of the PyShell window. Should this
 option be included somewhere?

It may be remotely desirable to be able to set the terminal encoding
in IDLE for debuggging purposes. But it's unrelated to the issue at
hand.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Is the following reasonable: The IDLE shell console has a locale of
 non-BMP utf8?

[BMP utf8]
That's indeed the approach that Andrew and I were discussing. 
Unfortunately, there is no codec for it yet. We were discussing
to add a utf8bom encoding to Python. This is a medium-sized
project, though (and again out of scope for this issue).

 If so, would it be reasonable to add a menu item to switch locales
 for the shell? This amounts to adding some extra code to
 OutputWindow's write() to raise encoding errors if the string
 contains unsupported characters, and possibly replacing characters to
 work around Tcl/Tk's non-BMP limitation.

Please open a separate issue for this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-15 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Martin, you are right. I created a separate issue #14326.

Let me know what I can do to help.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset c06b94c5c609 by Andrew Svetlov in branch 'default':
Issue #14200: Idle shell crash on printing non-BMP unicode character.
http://hg.python.org/cpython/rev/c06b94c5c609

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

Patch escapes avery non-ascii char while better to escape only non-BMP.

Will be done after #14304

--
resolution:  - fixed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Andrew, please reopen this issue. Your committed patch does not work if IDLE is 
not using the subprocess.

 got_ahsa = \N{GOTHIC LETTER AHSA}
 got_ahsa
Traceback (most recent call last):
  File pyshell#1, line 1, in module
got_ahsa
  File idlelib/PyShell.py, line 1255, in write
return self.shell.write(s, self.tags)
  File idlelib/PyShell.py, line 1233, in write
'Non-BMP character not supported in Tk')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: 
Non-BMP character not supported in Tk

However, it does work when IDLE uses a subprocess.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Attached is a patch to undo Andrew's and fixes the issue in a simple manner. 
The tcl_unicode_range.patch from Issue12342 has already been applied, so 
catching ValueError within IDLE is all that is now needed.

--
Added file: http://bugs.python.org/file24848/issue14200.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Attached is a better implementation of the patch. The Percolator which 
ultimately handles writing to the Text widget should intercept the ValueError 
due to non-BMP characters. The issue14200_rev1.patch fixes this issue and 
Issue13153.

--
status: closed - open
Added file: http://bugs.python.org/file24849/issue14200_rev1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

Roger, you are missing the difference between calling print() and evaluating 
expression in python interactive mode.
While later should be unicode escaped the former should to raise error — we 
need to follow the same way as console python interactive session does.

For the rest I like your simplification. And definitelly IDLE should to work 
both in subprocess and embedded modes — thank you for that point.

I'll make the final (I hope) patch a bit later.

--
assignee:  - asvetlov
resolution: fixed - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Andrew, I do admit that I have a lot to learn about Unicode support in Python, 
for instance with its error-handling and its corner cases.

On Windows Vista, I do see that print() behaves differently than evaluating the 
expression. An exception is raised for:
   print('\N{GOTHIC LETTER AHSA}')

On Linux, I see the character print as ? in xterm and as a '?' when evaluated. 
In gnome-terminal (Ubuntu Mono font) it prints as a box containing the code 
point in hex. No exception is raised.

I do see your point. The patch I provided always substitutes the unsupported 
character with its full expansion. Returning to a point earlier raised by 
Martin, using REPLACEMENT CHARACTER instead would be better. It would make the 
behavior of IDLE more consistent with xterm and gnome-terminal, although it 
would cause IDLE to hide errors if the program ran from a Windows console 
instead of IDLE. 

Given that Windows and Linux (Ubuntu) behave differently, I'd rather let IDLE 
mimic the behavior of a Linux console than a Windows console.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-14 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

I consulted with Martin at PyCon sprint and he suggested sulution which I'm 
following — to split `print` and REPL (read-eval-print loop).

Output passed to print() function encoded with sys.stdout.encoding

UTF has been invented to support any character.
Linux usually setted up to use utf-8 encoding by default (see LANG environment 
variable). There are no encoding issues with that.

xterm (old enough terminal) which you use cannot print non-BMP characters and 
replaces it with question marks.
Modern gnome-terminal prints that symbols very well.

Let's return to non-UTF terminal encodings.
If character cannot be encoded Python throws UnicodeEncodeError.
There's example:

andrew@tiktaalik ~/p/cpython bash -c LANG=C; ./python
Python 3.3.0a1+ (qbase qtip tip tk:c3ce8a8e6c9c+, Mar 14 2012, 15:54:55) 
[GCC 4.6.1] on linux
Type help, copyright, credits or license for more information.
 '\U00010340'
'\U00010340'
 print('\U00010340')
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode character '\U00010340' in 
position 0: ordinal not in range(128)
 

As you can see I have switched LANG to C (alias for ASCII) locale.

Eval printed with unicode escaping but `print` call raises error.
This happens because python's REPL calls sys.displayhook.
You can look at http://docs.python.org/dev/library/sys.html#sys.displayhook 
details. 
That code escapes unicode if terminal doesn't support it.

The same for Windows, OS X and any other platform.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-13 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
nosy: +asvetlov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-11 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Attached is a patch to have the rpc marshal exceptions. When used with Martin's 
patch, IDLE returns 

 '\U00010330'
Traceback (most recent call last):
  File pyshell#3, line 1, in module
'\U00010330'
ValueError: character U+10330 is above the range (U+-U+) allowed by Tcl


Martin: I disagree with the approach of raising a UnicodeEncodeError if IDLE 
can't render the output of a user's program, especially when the program would 
otherwise run without error if ran from outside of IDLE.

Would replacing these characters with ? and documenting this limitation in 
IDLE's docs be an acceptable solution?

--
Added file: http://bugs.python.org/file24788/rpc_marshal_exception.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-11 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

I made a mistake in msg155410. The results in the message are WITHOUT 
unicodeerror.diff applied. When it is applied, the IDLE shell gives:


 '\U00010330'
Traceback (most recent call last):
  File pyshell#1, line 1, in module
'\U00010330'
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: 
Non-BMP character not supported in Tk
Traceback (most recent call last):
** IDLE Internal Exception: 
  File idlelib/run.py, line 98, in main
ret = method(*args, **kwargs)
  File idlelib/run.py, line 305, in runcode
print_exception()
  File idlelib/run.py, line 168, in print_exception
print(line, end='', file=efile)
  File idlelib/rpc.py, line 599, in __call__
value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File idlelib/rpc.py, line 214, in remotecall
return self.asyncreturn(seq)
  File idlelib/rpc.py, line 245, in asyncreturn
return self.decoderesponse(response)
  File idlelib/rpc.py, line 265, in decoderesponse
raise what
ValueError: max() arg is an empty sequence

I will need to rework the rpc_marshal_exception patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-11 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Martin: I disagree with the approach of raising a UnicodeEncodeError
 if IDLE can't render the output of a user's program, especially when
 the program would otherwise run without error if ran from outside of
 IDLE.

This is really an independent issue, and I'd appreciate if people would
treat it as such. *This* issue is about IDLE crashing, not about how
Tkinter deals with non-BMP characters.

So if the RPC exception marshalling works, and can resolve this issue,
I'll be ready to commit this and close this issue. Opening another issue
dealing with the more general Tk problem would be fine with me.

I don't *quite* understand what you are proposing. If it is that
Tkinter always replaces non-BMP characters in string objects with
question marks, then I'm opposed. Tkinter can't know whether the
replacement is an acceptable loss or not; errors should never pass
silently.

If you are suggesting that IDLE's write function should write
a question mark instead of raising an exception: perhaps, but
a) I'd rather use REPLACEMENT CHARACTER instead of QUESTION MARK
b) I'd really try to find out first whether Tcl unknowingly
supports UTF-16, at least for rendering.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-11 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Having had some time to work on it, the bug is in the unicodeerror.diff patch. 
If the string is empty then max(s) will raise a ValueError. This is easy to 
trigger by generating an exception at the python prompt, like 1/0. 

Attached is a revised version of Martin's patch.

--
Added file: http://bugs.python.org/file24790/unicodeerror_rev1.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-11 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Martin, I got your message after I submitted the last one.

This issue does involve IDLE crashing, but it's not crashing due to non-BMP 
characters. That is a side-effect of a bigger issue with pythonw.exe. See 
Issue13582 for more information.

IDLE's shell output has a gross deficiency due to Tkinter's inability to handle 
Unicode properly. Why penalize a program for running in IDLE just because IDLE 
can't write something to the text widget? This is precisely what your approach 
is doing - making IDLE an even more restricted environment than it needs to be.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-06 Thread Vlastimil Brom

Vlastimil Brom vlastimil.b...@gmail.com added the comment:

Sorry for mixing the different problems, these were somehow things I noticed 
at once in the new python version, but I should have noticed the different 
domains myself.
I still might not understand the term crash properly - I just meant to 
distinguish between a single appropriate exception on an invalid operation 
(while the app is staying alive and works on next valid input) - as is the case 
with calling through python.exe, and - on the other hand - the immediate 
termination on encountering the invalid input, which happens with pythonw.exe.

Now I see, that with pythonw a tk app terminates with the first exception (in 
general) in py 3.3 and also 3.2 (as opposed to py 2.7, where it just swallows 
the exception and stays alive, as one would probably expect).

Should this be reported in a separate issue, or is this what remains relevant 
in *this* report? (Sorry for the confusion.)

vbr

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-06 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

That pythonw suddenly closes is a separate issue: if pythonw attempts to write 
to stderr, it crashes. To get your example to run in pythonw.exe,
try

pythonw.exe Lib\idlelib\idle.py 2 out.txt

I think the behavior of pythonw terminating when it can't write to stderr is 
actually correct: an exception is raised on attempting to write to stderr, 
which then can be printed (because there is no stderr).

So the real fault here is the traceback that python.exe reports.

To fix this, I think rpc.py should learn to marshal exceptions back to the 
subprocess. Then the initial sys.stdout.write should raise a UnicodeError 
(which it currently doesn't, either). This would get into the displayhook, 
which would then run use sys_displayhook_unencodable to backslashescape the 
unsupported character.

I'll attach a patch that at least makes the exception UnicodeEncodeError.

--
keywords: +patch
Added file: http://bugs.python.org/file24748/unicodeerror.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Vlastimil Brom

New submission from Vlastimil Brom vlastimil.b...@gmail.com:

Hi,
while testing python 3.3a1 a bit, especially the new string handling of non-BMP 
characters, I noticed a problem in Idle in this regard:

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on 
win32 ... 
[using win XPp SP3 Czech]

 got_ahsa = \N{GOTHIC LETTER AHSA}
 len(got_ahsa)
1
 got_ahsa.encode(unicode-escape)
b'\\U00010330'
 got_ahsa

[crash - idle shell window closes immediately without any visible error message 
or traceback]


I realised later, that tkinter probably won't be able to print wide-unicode 
characters anyway (according to 
http://bugs.python.org/issue12342 ), but Idle should probably just print the 
exception introduced there, e.g.
ValueError: character U+10330 is above the range (U+-U+) allowed by Tcl

Regards
vbr

--
components: IDLE, Tkinter, Unicode
messages: 154944
nosy: ezio.melotti, vbr
priority: normal
severity: normal
status: open
title: Idle shell crash on printing non-BMP unicode character
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +haypo, loewis, ned.deily, terry.reedy
type:  - crash

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
nosy: +serwy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Hi Vlastimil,

Can you repeat your test case while running IDLE from the command prompt and 
report the error you see?

python -m idlelib.idle

IDLE closes suddenly on Windows because IDLE uses pythonw.exe which has no 
stdout or stderr. When Tkinter encounters an error and tries to write to 
stderr, an error is raised in the Tkinter eventloop and the eventloop 
terminates.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Vlastimil Brom

Vlastimil Brom vlastimil.b...@gmail.com added the comment:

Hi,
thanks for the pointer, after invoking idle using python.exe, I don't see the 
crash mentioned in the report:

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on 
win32
Type copyright, credits or license() for more information.
 got_ahsa = \N{GOTHIC LETTER AHSA}
 len(got_ahsa)
1
 got_ahsa.encode(unicode-escape)
b'\\U00010330'
 got_ahsa

 print(got_ahsa)

 


I just get empty line as answer but no crash.

The console indeed contains the traceback with the error I expected

   vbr



Microsoft Windows XP [Verze 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Python33python.exe -m idlelib.idle
*** Internal Error: rpc.py:SocketIO.localcall()

 Object: stdout
 Method: bound method PseudoFile.write of idlelib.PyShell.PseudoFile object at
 0x01CDDB50
 Args: ('\U00010330',)

Traceback (most recent call last):
  File C:\Python33\lib\idlelib\rpc.py, line 188, in localcall
ret = method(*args, **kwargs)
  File C:\Python33\lib\idlelib\PyShell.py, line 1244, in write
self.shell.write(s, self.tags)
  File C:\Python33\lib\idlelib\PyShell.py, line 1226, in write
OutputWindow.write(self, s, tags, iomark)
  File C:\Python33\lib\idlelib\OutputWindow.py, line 40, in write
self.text.insert(mark, s, tags)
  File C:\Python33\lib\idlelib\Percolator.py, line 25, in insert
self.top.insert(index, chars, tags)
  File C:\Python33\lib\idlelib\ColorDelegator.py, line 80, in insert
self.delegate.insert(index, chars, tags)
  File C:\Python33\lib\idlelib\PyShell.py, line 322, in insert
UndoDelegator.insert(self, index, chars, tags)
  File C:\Python33\lib\idlelib\UndoDelegator.py, line 81, in insert
self.addcmd(InsertCommand(index, chars, tags))
  File C:\Python33\lib\idlelib\UndoDelegator.py, line 116, in addcmd
cmd.do(self.delegate)
  File C:\Python33\lib\idlelib\UndoDelegator.py, line 219, in do
text.insert(self.index1, self.chars, self.tags)
  File C:\Python33\lib\idlelib\ColorDelegator.py, line 80, in insert
self.delegate.insert(index, chars, tags)
  File C:\Python33\lib\idlelib\WidgetRedirector.py, line 104, in __call__
return self.tk_call(self.orig_and_operation + args)
ValueError: character U+10330 is above the range (U+-U+) allowed by Tcl

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

On 3.2.2, Win7, the length is 2 and printing in Idle prints a square, as it 
usually does for chars it cannot print. I presume Tk recognizes surrogate 
pairs. Printing to the screen should not raise an exception, so the square 
would be better. Even better would be to do what the 3.2 and 3.3 Command Prompt 
Interpreters do, which is to print an evaluable representation:

 c
'\U00010330'

I assume that this string is produced by python.exe rather than Windows. If so, 
neither of the two pythonw processes is currently doing the same conversion. My 
understanding is that the user pythonw process uses idlelib.rpc.RPCproxy 
objects to ship i/o calls to the idle pythonw process.

I presume we could find the idle process window .write methods and change lines 
like
self.text.insert(mark, s, tags)
to
try:
self.text.insert(mark, s, tags)
except SomeTkError:
self.text.insert(mark, expand(s), tags)
But it seems to me that the expansion should really be done in C in _tkinter, 
where the internal .kind attribute of strings is available. 

---
There is also an input crash. On 3.2, I tried to cut the square char and paste 
it into ord('') (both shell and edit window) to see what unicode char it is 
and IDLE fades away as you describe. That puzzles me, as I am normally able to 
paste BMP chars into idle without problem. In any case, I presume the problem 
is not idle-specific and would best be handled in _tkinter. Or does the crash 
happen in Windows or tcl/tk code before _tkinter ever sees the input?

When I paste the same into the 3.2 or 3.2 interpreter, it is converted to ascii 
'?'. I presume this is done by Windows Command Prompt before sending anything 
to python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Vlastimil Brom

Vlastimil Brom vlastimil.b...@gmail.com added the comment:

I'd like to add some further observations to the mentioned issue;
it seems, that the crash is indeed not specific to idle.
In a sample tkinter app, where I just display e.g. chr(66352) in an Entry 
widget, I also get the same immediate crash via pythonw.exe and the previously 
mentioned proper ValueError without a crash with python.exe.

I also tried to explicitly display surrogate pair, which were used 
automatically until python 3.2; these can be used in tkinter in 3.3, but there 
are limitations and discrepancies:

 
 got_ahsa = \N{GOTHIC LETTER AHSA}
 def wide_char_to_surrog_pair(char):
code_point = ord(char)
if code_point = 0x:
return char
else:
high_surr = (code_point - 0x1) // 0x400 + 0xD800
low_surr = (code_point - 0x1) % 0x400 + 0xDC00
return chr(high_surr)+chr(low_surr)

 ahsa_surrog = wide_char_to_surrog_pair(got_ahsa)
 print(ahsa_surrog)
̰
 repr(ahsa_surrog)
'_ud800\x00udf30'
 ahsa_surrog
'Pud800 udf30'

[the space in the middle of the last item might be \x00, as it terminates the 
clipboard content, the rest is copied separately]

the printed square corresponds with the given character and can be used in 
other programs etc. (whereas in py 3.2, the same value was used for repr and a 
direct display of the string in the interpreter, there are three different 
formats in py 3.3.

I also noticed that surogate pair is not supported as input for 
unicodedata.name(...) anymore:
 
 import unicodedata
 unicodedata.name(ahsa_surrog)
Traceback (most recent call last):
  File pyshell#60, line 1, in module
unicodedata.name(ahsa_surrog)
TypeError: need a single Unicode character as parameter
 

(in 3.2 and probably others it returns the expected 'GOTHIC LETTER AHSA')

(I for my part would think, that e.g. keeping a  bit liberal (but still 
non-ambiguous) input possibilities for unicodedata wouldn't hurt. Also, if 
tkinter is not going to support wide unicode natively any time soon, the output 
conversion using surrogates, which are also understandable for other programs, 
seems the most usable option in this regard.

Hopefully, this is somehow relevant for the original issue -
I am somehow not sure, whether some parts would be better posted as separate 
issues, or whether this is the planned and expected behaviour anyway.

regards,
   vbr

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Vlastimil: you are mixing issues. Some of your observations are actually 
correct behaviour; please don't clutter the report with that, but report each 
separate behavior in a separate report. In Python 3.3, surrogate pairs do *not* 
substitute for the the actual character, since the internal representation is 
not UTF-16 anymore.

Also, when you run a Tkinter app in IDLE: while you get a proper traceback 
output, your conclusion that python.exe does not crash is incorrect: it 
crashes just in the very same way that IDLE crashes. Except when run inside 
IDLE, it is a subprocess that crashes (i.e. terminates with a traceback 
output), not IDLE itself.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Oops, wrong issue.

--
resolution: fixed - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
Removed message: http://bugs.python.org/msg155005

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14200] Idle shell crash on printing non-BMP unicode character

2012-03-05 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
Removed message: http://bugs.python.org/msg155006

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com