[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-12-18 Thread STINNER Victor

STINNER Victor  added the comment:

Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted 
and implemented in Python 3.7!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread STINNER Victor

STINNER Victor added the comment:

> But maybe I'm just missing something.

This issue fixed exactly one use case: "List a directory into stdout" (similar 
to the UNIX "ls" or Windows "dir" commands):
https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout

Your use case is more "Display Unicode characters into stdout":
https://www.python.org/dev/peps/pep-0540/#display-unicode-characters-into-stdout

This use case is not supported by the issue. It should be fixed by PEP 538 or 
PEP 540.

Please join the happy discussion on the python-ideas mailing list to discuss 
how to "force UTF-8": this issue is closed, you shouldn't add new comments 
(other people will not see your comments).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread Sworddragon

Sworddragon added the comment:

The point is this ticket claims to be using the surrogateescape error handler 
for sys.stdout and sys.stdin for the C locale. I have never used 
surrogateescape explicitly before and thus have no experience for it and 
consulting the documentation mentions throwing an exception only for the strict 
error handler. I don't see anything that would make me think that 
surrogateescape would throw here an exception too. But maybe I'm just missing 
something.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread STINNER Victor

STINNER Victor added the comment:

"I thought with the surrogateescape error handler now being used for sys.stdout 
this would not throw an exception but I'm getting this: (...)"

Please see the two recently proposed PEP: Nick's PEP 538 and my PEP 540, both 
propose (two different) solutions to your issue, especially for the POSIX 
locale (aka "C" locale).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread Sworddragon

Sworddragon added the comment:

Bug #28180 has caused me to make a look at the "encoding" issue this and the 
tickets before have tried to solve more or less. Being a bit unsure what the 
root cause and intention for all this was I'm now at a point to actually check 
this ticket. Here is an example code (executed with Python 3.5.3 RC1 by having 
LANG set to C):

import sys
sys.stdout.write('ä')


I thought with the surrogateescape error handler now being used for sys.stdout 
this would not throw an exception but I'm getting this:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 0: 
ordinal not in range(128)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  We should not overcomplicate this. I suggest that we simply use utf-8 under 
  the C locale.
 
 Do you mean utf8/strict or utf8/surrogateescape?
 
 utf8/strict doesn't work (os.listdir raises an unicode error) if your
 system is configured to use latin1 (ex: filenames are stored in this
 encoding), but unfortunately your program is running in an empty
 environment (so will use the POSIX locale).

The issue is about stdin and stdout, I'm not sure why os.listdir would
be affected.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Nick Coghlan

Nick Coghlan added the comment:

Victor was referring to code like print(os.listdir()). Those are the
motivating cases for ensuring round trips from system APIs to the standard
streams work correctly.

There's also the problem that sys.argv currently relies on the locale
encoding directly, because the filesystem encoding hasn't been worked out
at that point (see issue 8776). So this current change will also make
print(sys.argv) work more reliably in the POSIX locale.

The conclusion I have come to is that any further decoupling of Python 3
from the locale encoding will actually depend on getting the PEP 432
bootstrapping changes implemented, reviewed and the PEP approved, so we
have more interpreter infrastructure in place by the time the interpreter
starts trying to figure out all these boundary encoding issues.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The conclusion I have come to is that any further decoupling of Python 3
 from the locale encoding will actually depend on getting the PEP 432
 bootstrapping changes implemented, reviewed and the PEP approved, so we
 have more interpreter infrastructure in place by the time the interpreter
 starts trying to figure out all these boundary encoding issues.

Yeah. My proposal had more to do with the fact that we should some day
switch to utf-8 by default on all POSIX systems, regardless of what the
system advertises as best encoding.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Nick Coghlan

Nick Coghlan added the comment:

Antoine Pitrou added the comment:
 Yeah. My proposal had more to do with the fact that we should some day
 switch to utf-8 by default on all POSIX systems, regardless of what the
 system advertises as best encoding.

Yeah, that seems like a plausible future to me as well, and knowing it's a
step along that path actually gives me more motivation to get back to
working on the startup issues :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan

Nick Coghlan added the comment:

Additional environments where the system misreports the encoding to use 
(courtesy of Armin Ronacher  Graham Dumpleton on Twitter): upstart, Salt, 
mod_wsgi.

Note that for more complex applications (e.g. integrated web UIs, socket 
servers, sending email), round tripping to the standard streams won't be enough 
- what we really need is a better source of truth as to the real system 
encoding when POSIX compliant systems provide incorrect configuration data to 
the interpreter.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan

Nick Coghlan added the comment:

Issue 21368 now suggests looking for /etc/locale.conf before falling back to 
ASCII+surrogateescape.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

We should not overcomplicate this. I suggest that we simply use utf-8 under the 
C locale.

--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan

Nick Coghlan added the comment:

If you can convince Stephen Turnbull that's a good idea, sure. It's
probably more likely to be the right thing than ASCII or ASCII +
surrogateescape, but in the absence of hard data, he's in a better
position than we are to judge the likely impact of that, at least in Japan.

I'm also going to hunt around on freedesktop.org to see if there's anything
more general there on the topic of encodings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread STINNER Victor

STINNER Victor added the comment:

 We should not overcomplicate this. I suggest that we simply use utf-8 under 
 the C locale.

Do you mean utf8/strict or utf8/surrogateescape?

utf8/strict doesn't work (os.listdir raises an unicode error) if your
system is configured to use latin1 (ex: filenames are stored in this
encoding), but unfortunately your program is running in an empty
environment (so will use the POSIX locale).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread STINNER Victor

STINNER Victor added the comment:

 We should not overcomplicate this. I suggest that we simply use utf-8 under 
 the C locale.

Please open a new issue if you would prefer UTF-8. You will have to solve 
different technical issues. I tried to list some of them in issues #19846 and 
#19847.

In short, you should always decode and encode OS data with the same encoding. 
Python file system encoding is the locale encoding because in some places, 
PyUnicode_DecodeLocale[AndSize]() is used (ex: to decode PYTHONWARNINGS 
environment variable). A common location is PyUnicode_DecodeFSDefaultAndSize() 
before the Python codec is loaded. See also _Py_wchar2char() and 
_Py_char2wchar() functions which use the locale encoding and are used in many 
places.

I'm now closing the issue because the initial point (use surrogateescape error 
handler) is implemented in Python 3.5, and backporting such major change in 
Python 3.4 branch is risky right now.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-09 Thread Nick Coghlan

Nick Coghlan added the comment:

The default locale on Fedora is indeed UTF-8 these days - the problem is that 
*users* are used to being able to use LANG=C to force the POSIX locale 
(whether for testing purposes or other reasons), and that currently means 
system utilities written in Python may fail in such situations if used with 
UTF-8 data from the filesystem (or elsewhere). (I believe there may also be 
other cases where POSIX mandates the use of the C locale, but Toshio would be 
in a better position than I am to confirm whether or not that is actually the 
case).

So perhaps this is best left in a wait  see mode for now - as the Fedora 
migration to Python 3 progresses, if the folks working on that find specific 
utilities where the Python 3.4 standard stream handling in the C locale appears 
problematic, then Slavek  Toshio can bring them up here.

The counterargument is that if we're going to change it, 3.4.1 would be a 
better time frame than 3.4.2. In that case, the task of identifying specific 
Fedora utilities of concern still falls back on Toshio  Slavek, but it would 
be a matter of going hunting for them specifically *now*, rather than waiting 
until they come up over the course of the migration.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-09 Thread STINNER Victor

STINNER Victor added the comment:

 The default locale on Fedora is indeed UTF-8 these days - the problem is that 
 *users* are used to being able to use LANG=C to force the POSIX locale 
 (whether for testing purposes or other reasons), and that currently means 
 system utilities written in Python may fail in such situations if used with 
 UTF-8 data from the filesystem (or elsewhere). (I believe there may also be 
 other cases where POSIX mandates the use of the C locale, but Toshio would be 
 in a better position than I am to confirm whether or not that is actually the 
 case).

A common situation where you get a C locale is for programs started by
a crontab. If I remember correctly, these programs start with the C
locale, instead of the system (user?) locale.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-08 Thread STINNER Victor

STINNER Victor added the comment:

However, I'd still like to discuss the idea of backporting this to 3.4.1.

THe idea of doing this change in Python 3.5 is that I have no idea of the risk 
of regression. To backport such change in a minor version (3.4.1), I would feel 
more confident with user tests of Python 3.5 or patched Python 3.4.

That has long made Toshio nervous about the migration of core services to 
Python 3 (https://fedoraproject.org/wiki/Changes/Python_3_as_Default), and his 
concerns make sense to me, as that migration covers little things like the 
installer, package manager, post-image install initialisation, etc. 

Which programs in this test are or may be running with the POSIX locale?

Fedora doesn't use en_US.utf8 locale by default?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-28 Thread Nick Coghlan

Nick Coghlan added the comment:

This seems to be working on the buildbots for 3.5 now (buildbot failures appear 
to be due to other issues).

However, I'd still like to discuss the idea of backporting this to 3.4.1.

From a Fedora point of view, it's still *very* easy to flip an environment 
into POSIX mode, so even if the system is appropriately configured to use 
UTF-8 everywhere, Python 3.4 may still blow up if a script or application ends 
up running under the POSIX locale.

That has long made Toshio nervous about the migration of core services to 
Python 3 (https://fedoraproject.org/wiki/Changes/Python_3_as_Default), and his 
concerns make sense to me, as that migration covers little things like the 
installer, package manager, post-image install initialisation, etc. I'm not 
sure the Fedora team can deliver on the Users shouldn't notice any changes, 
except that packages in minimal buildroot and on LiveCD will be python3-, not 
python-. aspect of the change proposal without this behavioural tweak in the 
3.4 series as well.

Note that this *isn't* a blocker for the migration - if it was, it would be 
mentioned in the Fedora proposal. However, I think there's a risk to the Fedora 
user experience if the status quo remains in place for the life of Python 3.4, 
and I'd hate for the first encounter Fedora users have with Python 3 to be 
inexplicable tracebacks from components that have been migrated.

--
versions: +Python 3.4 -Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-19 Thread Atsuo Ishimoto

Changes by Atsuo Ishimoto ishim...@gembook.org:


--
nosy: +ishimoto

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset bc06f67234d0 by Victor Stinner in branch 'default':
Issue #19977: When the ``LC_TYPE`` locale is the POSIX locale (``C`` locale),
http://hg.python.org/cpython/rev/bc06f67234d0

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread STINNER Victor

STINNER Victor added the comment:

Test failing on x86 OpenIndiana 3.x buildbot:

http://buildbot.python.org/all/builders/x86%20OpenIndiana%203.x/builds/7939/steps/test/logs/stdio

==
FAIL: test_forced_io_encoding (test.test_capi.EmbeddingTests)
--
Traceback (most recent call last):
  File 
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/test_capi.py, 
line 352, in test_forced_io_encoding
self.assertEqual(out.strip(), expected_output)
AssertionError: '--- [79 chars]646:surrogateescape\nstdout: 
646:surrogateesca[576 chars]lace' != '--- [79 chars]646:strict\nstdout: 
646:strict\nstderr: 646:ba[540 chars]lace'
  --- Use defaults ---
  Expected encoding: default
  Expected errors: default
- stdin: 646:surrogateescape
- stdout: 646:surrogateescape
+ stdin: 646:strict
+ stdout: 646:strict
  stderr: 646:backslashreplace
  --- Set errors only ---
  Expected encoding: default
  Expected errors: surrogateescape
  stdin: 646:surrogateescape
  stdout: 646:surrogateescape
  stderr: 646:backslashreplace
  --- Set encoding only ---
  Expected encoding: latin-1
  Expected errors: default
- stdin: latin-1:surrogateescape
- stdout: latin-1:surrogateescape
+ stdin: latin-1:strict
+ stdout: latin-1:strict
  stderr: latin-1:backslashreplace
  --- Set encoding and errors ---
  Expected encoding: latin-1
  Expected errors: surrogateescape
  stdin: latin-1:surrogateescape
  stdout: latin-1:surrogateescape
  stderr: latin-1:backslashreplace

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread STINNER Victor

STINNER Victor added the comment:

New behaviour:

$ mkdir z
$ touch z/abcé
$ LC_CTYPE=C ./python -c 'import os; print(os.listdir(z)[0])'
abcé

Old behaviour, before the change (test with Python 3.3):

$ LC_CTYPE=C python3 -c 'import os; print(os.listdir(z)[0])'
Traceback (most recent call last):
  File string, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: 
ordinal not in range(128)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 3589980c98de by Victor Stinner in branch 'default':
Issue #19977, #19036: Always include locale.h in pythonrun.c
http://hg.python.org/cpython/rev/3589980c98de

New changeset 94d5025c70a3 by Victor Stinner in branch 'default':
Issue #19977: Enable test_c_locale_surrogateescape() on Windows
http://hg.python.org/cpython/rev/94d5025c70a3

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c9905e802042 by Victor Stinner in branch 'default':
Issue #19977: Fix test_capi when LC_CTYPE locale is POSIX
http://hg.python.org/cpython/rev/c9905e802042

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-02-11 Thread STINNER Victor

STINNER Victor added the comment:

 Reintroducing moji-bake intentionally doesn't sound like a particularly good 
 idea, wasn't that what python3 was supposed to help prevent?

Sometimes practicality beats purity :-(

I tried to convince users that their computer was not well configured, they 
always replied that Python 3 fails where Perl, PHP, Python 2, C, etc. just 
work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-05 Thread Bohuslav Slavek Kabrda

Bohuslav Slavek Kabrda added the comment:

Nick: Sure, once there is an upstream solution that people have agreed on, I'll 
look into backporting it, NP. Thanks for letting me know about this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-04 Thread Larry Hastings

Larry Hastings added the comment:

Yeah, unless there was a *huge* amount of support for changing this, it's way 
too late for 3.4.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-02 Thread Nick Coghlan

Nick Coghlan added the comment:

Larry: I'm assuming it's way too late to make a change like this for the 3.4 
release?

Slavek: assuming this change is made for 3.5 upstream, we may want to look at 
backporting it as a 3.4 patch in Fedora (as part of the Python-3-by-default 
project). Otherwise it's very easy to provoke Python 3 into throwing Unicode 
errors when attempting to print data provided by the OS.

--
nosy: +bkabrda, larry
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-21 Thread Jakub Wilk

Changes by Jakub Wilk jw...@jwilk.net:


--
nosy: +jwilk

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-19 Thread Martin Panter

Changes by Martin Panter vadmium...@gmail.com:


--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor

STINNER Victor added the comment:

Oh, in fact, sys.stdin is also modified by the patch (as I expected).

--
title: Use surrogateescape error handler for sys.stdout on UNIX for the C 
locale - Use surrogateescape error handler for sys.stdin and sys.stdout on 
UNIX for the C locale

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Sworddragon

Sworddragon added the comment:

What would happen if we call this example script with LANG=C on the patch?:

---
import os
for name in sorted(os.listdir('ä')):
print(name)
---

Would it throw an exception on os.listdir('ä')?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor

STINNER Victor added the comment:

test_ls.py: test script producing invalid filenames and then trying to display 
them into stdout.

Output with UTF-8 locale, UTF-8 terminal and Python 3.3 (or unpatched 3.4, it's 
the same):

ascii.txt
UnicodeError 'invalid_utf8:\udcff.txt'
UnicodeError 'latin1:\udce9.txt'
utf8:é€.txt

Output with C locale (ASCII), UTF-8 terminal and Python 3.3:

ascii.txt
UnicodeError 'invalid_utf8:\udcff.txt'
UnicodeError 'latin1:\udce9.txt'
UnicodeError 'utf8:\udcc3\udca9\udce2\udc82\udcac.txt'

Output with C locale (ASCII), UTF-8 terminal and patched Python 3.4:

ascii.txt
invalid_utf8:�.txt
latin1:�.txt
utf8:é€.txt

You get no Unicode error with LANG=C, but you get mojibake instead.

--
Added file: http://bugs.python.org/file33124/test_ls.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor

STINNER Victor added the comment:

os.fsencode(text) always fail if text cannot be encoded to 
sys.getfilesystemencoding(). surrogateescape doesn't help here.

Your example is artificial, you should not get 'ä'. All OS data is decoded 
from the filesystem encoding using the surrogateescape error handler (except on 
Windows, where strict is used, but it's a different story, Python uses Unicode 
functions when available so don't worry). So all these data can always be 
encoded back to bytes using os.fsencode().

More generally, os.fsencode(os.fsdecode(read_data)) == read_data is always true 
on Unix, with any filesystem (locale) encoding.

You may get Unicode data from other sources like files or a GUI, but I don't 
see what can be done here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 When LANG=C is used to get the english language (which is a mistake,
 LC_CTYPE=C should be used instead)

I think you mean LC_MESSAGES=C here.
(but it's not only about the English language; it's also about other locale 
parameters such as number formatting)

I think we should start thinking about making utf-8 the default filesystem 
encoding in 3.5 (under Unix).

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread R. David Murray

R. David Murray added the comment:

Reintroducing moji-bake intentionally doesn't sound like a particularly good 
idea, wasn't that what python3 was supposed to help prevent?

It does seem like a utf-8 default is the Way of the Future.  Or even the 
present, most places.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

My impression was that python3 was supposed to help get rid of UnicodeError 
tracebacks, not mojibake.  If mojibake was the problem then we should never 
have gone down the surrogateescape path for input.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Mojibake in input can cause decoding error in other application which consumes 
output of Python script. In some cases this can be even worse thin UnicodeError 
in producer.

But for C locale this makes sense. I think we should try this experiment in 
3.5. There will be much time for testing before 3.5 beta 1.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Nick Coghlan

Nick Coghlan added the comment:

Getting rid of mojibake was the goal, surrogateescape was about dealing with 
cases where the avoid mojibake checks were spuriously breaking round-tripping 
between OS APIs due to other configuration errors (with LANG=C being set, or 
LANG not being set at all being the main problem). Other high mojibake risk 
power tools (like changing the encoding of an already open stream) are likely 
to return in the future, since there *are* cases where they're the right answer 
(e.g. you can't right an iconv equivalent in Python 3 at the moment, we need 
issue 15216 implemented before that will be possible).

+1 for this solution - see issue 19846 for the long discussion which got us to 
this point (there are a few unrelated tangents, a couple of them my fault, but 
this is definitely an improvement over the status quo.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com