[issue18713] Clearly document the use of PYTHONIOENCODING to set surrogateescape

2017-03-24 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
resolution:  -> duplicate
status: open -> pending
superseder:  -> The documentation for the print function should explain/point 
to how to control the sys.stdout encoding

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18713] Clearly document the use of PYTHONIOENCODING to set surrogateescape

2015-05-05 Thread Nikolaus Rath

Nikolaus Rath added the comment:

The first thing that would come to my mind when reading Nick's proposed 
document (without first reading this bug report) is So why the heck is this 
not the default?.

It would probably save a lot of people a lot of anger if there was also a brief 
explanation addressing this obvious first response :-).

--
nosy: +nikratio

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18713
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18713] Clearly document the use of PYTHONIOENCODING to set surrogateescape

2013-08-23 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
dependencies: +Empty PYTHONIOENCODING is not the same as nonexistent

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18713] Clearly document the use of PYTHONIOENCODING to set surrogateescape

2013-08-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

With new subject this issue looks as a duplicate of (or tightly related to) 
issue12832.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18713] Clearly document the use of PYTHONIOENCODING to set surrogateescape

2013-08-22 Thread Nick Coghlan

Nick Coghlan added the comment:

Note: I created issue 18814 to cover some additional tools for working with 
surrogate escaped strings.

For this issue, we currently have http://docs.python.org/3/howto/unicode.html, 
which aims to be a more comprehensive guide to understanding Unicode issues.

I'm thinking we may want a Debugging Unicode Errors document, which defers to 
the existing howto guide for those that really want to understand Unicode, and 
instead focuses on quick fixes for resolving various problems that may present 
themselves.

Application developers will likely want to read the longer guide, while the 
debugging document would be aimed at getting script writers past their 
immediate hurdle, without necessarily gaining a full understanding of Unicode.

The would be for this page to become the top hit for python surrogates not 
allowed, rather than the current top hit, which is a rejected bug report about 
it (http://bugs.python.org/issue13717).

For example:


What is the meaning of UnicodeEncodeError: surrogates not allowed?


Operating system metadata on POSIX based systems like Linux and Mac OS X may 
include improperly encoded text values. To cope with this, Python uses the 
surrogateescape error handler to store those arbitrary bytes inside a Unicode 
object. When converted back to bytes using the same encoding and error handler, 
the original byte sequence is reproduced exactly. This allows operations like 
opening a file based on a directory listing to work correctly, even when the 
metadata is not properly encoded according to the system settings.

The surrogates not allowed error appears when a string from one of these 
operating system interfaces contains an embedded arbitrary byte sequence, but 
an attempt is made to encode it using the default strict error handler rather 
than the surrogateescape handler. This commonly occurs when printing 
improperly encoded operating system data to the console, or writing it to a 
file, database or other serialised interface.

The ``PYTHONIOENCODING`` environment variable can be used to ensure operating 
system metadata can always be read via sys.stdin and written via sys.stdout. 
The following command will display the encoding Python will use by default to 
interact with the operating system::

$ python3 -c import sys; print(sys.getfilesystemencoding())
utf-8

This can then be used to specify an appropriate setting for 
``PYTHONIOENCODING``:: 


$ export PYTHONIOENCODING=utf-8:surrogateescape

For other interfaces, there is no such general solution. If allowing the 
invalid byte sequence to propagate further is acceptable, then enabling the 
surrogateescape handler may be appropriate. Alternatively, it may be better to 
track these corrupted strings back to their point of origin, and either fix the 
underlying metadata, or else filter them out early on.


If issue 18814 is implemented, then it could point to those tools. Similarly, 
issue 15216 could be referenced if that is implemented.

--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python
title: Enable surrogateescape on stdin and stdout when appropriate - Clearly 
document the use of PYTHONIOENCODING to set surrogateescape

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com