New submission from Nadeem Vawda <nadeem.va...@gmail.com>:

I've recently come across a strange failure in the tests for the input()
built-in function:

    $ ./python -E -m test -v test_readline test_builtin

    [... snip ...]

    ======================================================================
    FAIL: test_input_tty_non_ascii (test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1079, 
in test_input_tty_non_ascii
        self.check_input_tty("prompté", b"quux\xe9", "utf-8")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, 
in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +


    ======================================================================
    FAIL: test_input_tty_non_ascii_unicode_errors 
(test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1083, 
in test_input_tty_non_ascii_unicode_errors
        self.check_input_tty("prompté", b"quux\xe9", "ascii")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, 
in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +

The failure only manifests itself if the readline module is loaded before
test_builtin runs (hence the presence of test_readline above). It will
not occur if regrtest is run with either of the -j or -W flags (which is
why it hasn't been seen on the buildbots).

The problem seems to be that readline assumes that its input should use
the locale encoding, and silently strips out any undecodable chars. This
breaks the tests mentioned above, since they set up sys.stdin to use the
surrogateescape error handler, expecting invalid characters to be escaped
rather than discarded.

This problem doesn't crop up if readline is *not* loaded, because in that
case PyOS_Readline() falls back to a stdio-based implementation
(PyOS_StdioReadline()) that preserves invalid characters, allowing them
to be handled properly by sys.stdin's encoding and error handler.

I have been able to fix the test failures with the attached patch, which
stops readline from eating invalid characters, making it consistent with
the stdio-based fallback. Can someone with more knowledge of readline
and/or locale issues advise whether the change is a good idea?

----------
components: Extension Modules
files: rl-locale.diff
keywords: patch
messages: 152080
nosy: nadeem.vawda
priority: normal
severity: normal
stage: patch review
status: open
title: readline-related test_builtin failure
type: behavior
versions: Python 3.2, Python 3.3
Added file: http://bugs.python.org/file24337/rl-locale.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13886>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to