[issue19100] Use backslashreplace in pprint

2017-12-18 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Try with LANG=en_US.

And even UTF-8 can fail.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2017-12-18 Thread STINNER Victor

STINNER Victor  added the comment:

$ LANG= ./python -c "import pprint; pprint.pprint('\u20ac')"

Thanks to the PEP 538 and PEP 540, this command now works as expected in Python 
3.7:

vstinner@apu$ LANG= python3.7 -c "import pprint; pprint.pprint('\u20ac')"
'€'

Do we still need pprint_unencodable_2.patch workaround?

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2015-03-31 Thread Martin Panter

Martin Panter added the comment:

Walter: the first line encoding with textio.errors is meant to handle the case 
where the output stream already has its own permissive error handler set. But 
anyway I was just trying to point out that it might be better to do the 
backslash escaping at the text level, and write the escaped text string to the 
original stream.

Serhiy: thanks for pointing out IDLE’s stdout. It seems the encoding can be set 
to say ASCII by the locale, yet it still accepts non-ASCII text. But I guess 
that’s a separate issue.

I haven’t tested the patch, but reading it, I think the there may be a couple 
of problems:

* Newline handling will be wrong e.g. on windows, where CRLF would be expected. 
I am not aware of a proper way to determine the newline translation mode of a 
text stream in arbitrary cases.
* The order of text written directly to stdout and via pprint would get messed 
up, because pprint would bypass the buffering in the original text stream.
* For encodings that store state, such as “utf-8-sig”, I think you may see an 
extra signature output, due to creating a new TextIOWrapper. With encoders 
whose state depends on the actual text, like the "hz" codec, multiplexing ASCII 
and GB2312 could be a more serious problem.

Issue 15216 is slightly related, and has a patch apparently allowing the 
encoding and error handler to be changed on a text stream. But I guess it is no 
good here because you need backwards compatibility with other non-TextIOWrapper 
streams.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2015-03-31 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> What is the reasoning behind the DecodeWriter case, where the original stream 
> has an interesting encoding, but “buffer” is None? Are there any real-world 
> cases like that?

sys.stdout and sys.stderr in IDLE.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2015-03-31 Thread Walter Dörwald

Walter Dörwald added the comment:

The linked code at https://github.com/vadmium/python-iview/commit/68b0559 seems 
strange to me:

try:
text.encode(encoding, textio.errors or "strict")
except UnicodeEncodeError:
text = text.encode(encoding, errors).decode(encoding)
return text

is the same as:

return text.encode(encoding, errors).decode(encoding)

because when there are no unencodable characters in text, the error handler 
will never be invoked.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2015-03-30 Thread Martin Panter

Martin Panter added the comment:

I agree with Serhiy that using a permissive error handler with pprint() is 
appropriate.

What is the reasoning behind the DecodeWriter case, where the original stream 
has an interesting encoding, but “buffer” is None? Are there any real-world 
cases like that? Your mock test case sets encoding="latin1" with no buffer, but 
that class will also write non-latin1 strings, so there is no problem.

Also I wonder if flushing the stream once or twice for each pprint() call is a 
wise move.

Another way to tackle this might be a function that translates the non-Latin-1 
or whatever characters, allowing the original write() or whatever method to 
still be used. Here is a Python 2 and 3 compatible attempt: 
. 
Python 3 only version: 
. This function is 
originally used for printing descriptive comments to stdout (alongside other 
text where the “strict” error handler is appropriate). But I think it could be 
generally usable for pprint(), sys.displayhook(), etc as well.

--
nosy: +vadmium

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The purpose of pprint.pprint() is to produce human-readable output. In this 
case some output is better than nothing. It isn't designed to be parseable by 
other programs, because sometimes it is even less accurate than the result of 
repr() (pprint() truncates long reprs and losses information for dict 
subclasses). Also result of pprint() can be changed from version to version 
(e.g. issue17150). The main source of non-ASCII characters is string reprs and 
for them the backslashreplace error handler doesn't lose information. And 
pprint.pprint() is mainly used for screen output too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-14 Thread Walter Dörwald

Walter Dörwald added the comment:

sys.displayhook doesn't fail, because it uses the backslashreplace error 
handler, and for sys.displayhook that's OK, because it's only used for screen 
output and there some output is better than no output. However print and 
pprint.pprint might be used for output that is consumed by other programs (via 
pipes etc.) and IMHO in this case "Errors should never pass silently."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-11 Thread Fred L. Drake, Jr.

Changes by Fred L. Drake, Jr. :


--
assignee:  -> fdrake

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

pprint is not print.

>>> print('\u20ac')
€
>>> import pprint; pprint.pprint('\u20ac')
'€'

Default sys.displayhook doesn't fail on unencodable output.

$ LANG=C ./python
Python 3.4.0b1 (default:e961a166dc70+, Dec 11 2013, 13:57:17) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '\u20ac'
'\u20ac'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-11 Thread Walter Dörwald

Walter Dörwald added the comment:

This is not the fault of pprint. IMHO it doesn't make sense to fix anything 
here, at least not for pprint specifically. print() has the same "problem":

   $ LANG= ./python -c "print('\u20ac')"
 
   Traceback (most recent call last):
 File "", line 1, in 
   UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in 
position 0: ordinal not in range(128)

--
nosy: +doerwalter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

In new patch wrapping stream is moved to PrettyPrinter constructor.

--
Added file: http://bugs.python.org/file33084/pprint_unencodable_2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-12-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Any review?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19100] Use backslashreplace in pprint

2013-09-27 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

Currently pprint.pprint() fails on unencodable characters.

$ LANG=en_US.utf8 ./python -c "import pprint; pprint.pprint('\u20ac')"
'€'
$ LANG= ./python -c "import pprint; pprint.pprint('\u20ac')"
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/serhiy/py/cpython/Lib/pprint.py", line 56, in pprint
printer.pprint(object)
  File "/home/serhiy/py/cpython/Lib/pprint.py", line 137, in pprint
self._format(object, self._stream, 0, 0, {}, 0)
  File "/home/serhiy/py/cpython/Lib/pprint.py", line 274, in _format
write(rep)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 
1: ordinal not in range(128)

This is a regression from Python 2 in which repr() always returns ascii string.

$ LANG= python2.7 -c "import pprint; pprint.pprint(u'\u20ac')"
u'\u20ac'

Perhaps pprint() should use the backslashreplace error handler (as 
sys.displayhook()). With the proposed patch:

$ LANG= ./python -c "import pprint; pprint.pprint('\u20ac')"
'\u20ac'

--
components: Library (Lib), Unicode
files: pprint_unencodable.patch
keywords: patch
messages: 198465
nosy: ezio.melotti, fdrake, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Use backslashreplace in pprint
type: behavior
versions: Python 3.3, Python 3.4
Added file: http://bugs.python.org/file31881/pprint_unencodable.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com