[issue22555] Tracking issue for adjustments to binary/text boundary handling

2018-06-09 Thread STINNER Victor


STINNER Victor  added the comment:

> https://vstinner.github.io/python30-listdir-undecodable-filenames.html

Oh, thanks for mentioning my series of articles.

It's also nice to see that we are now able to close this 4 years old issue!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2018-06-09 Thread Nick Coghlan


Nick Coghlan  added the comment:

Adding a link to the first post in a series of articles from Victor Stinner 
regarding the evolution over time of the text encoding assumptions in Python 
3's operating system interfaces:

https://vstinner.github.io/python30-listdir-undecodable-filenames.html

That way if anyone does stumble across this meta-issue, they'll have an easier 
time discovering that more readable version of the history involved :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2018-06-09 Thread Nick Coghlan


Nick Coghlan  added the comment:

Correction: I just rejected my proposed wsgiref in issue 22264 as failing to 
make a sufficient case for their practical utility, so that one is closed as 
well :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2018-06-09 Thread Nick Coghlan


Nick Coghlan  added the comment:

With PEPs 538 and 540 merged for Python 3.7 (so we'll almost always use UTF-8 
instead of ASCII when the platform nominates the C or POSIX locale as the 
currently active one), and Windows previously switching to assuming UTF-8 
instead of mbcs for binary interfaces in Python 3.6, I think this tracking 
issue has served its purpose.

Of the issues previously mentioned here, the following are still open:

* Improved Unicode handling in the Windows console: issue 17620
* Utilities for clearing out surrogates from strings: issue 18814
* Treating "wsgistr" as a serialisation format: issue 22264
* Defining a formatting mini-language for hex output: issue 22385

I don't think any of those share enough characteristics to be worth continuing 
to track as a group, so I'm closing this meta-issue as out of date :)

--
dependencies:  -Add utilities to "clean" surrogate code points from strings, 
Add wsgiref.util.dump_wsgistr & load_wsgistr, Define a binary output formatting 
mini-language for *.hex(), Python interactive console doesn't use sys.stdin for 
input
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2016-09-10 Thread Nick Coghlan

Nick Coghlan added the comment:

Added another issue to the tracking list: 

* Automatically decode binary data in json.loads: issue #17909

--
dependencies: +Autodetecting JSON encoding

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2016-09-07 Thread Nick Coghlan

Nick Coghlan added the comment:

Likely to be resolved, or at least significantly updated, for 3.6 due to PEP 
528 and PEP 529:


* Using sys.stdin consistently at the default interactive prompt: issue 1602
* Improved Unicode handling in the Windows console: issue 17620
* Allowing text encoding and error handling to be specified in subprocess 
module APIs: issue 6135

New change landing in 3.6:

* Changing the Windows default encoding to UTF-8 to better match bytes handling 
conventions on *nix systems: issue 27781


Likely deferred to 3.7:

* providing a way to change the encoding of an existing stream: issue 15216
* utilities for clearing out surrogates from strings: issue 18814
* treating "wsgistr" as a serialisation format: issue 22264
* defining a formatting mini-language for hex output: issue 22385

--
dependencies: +Change sys.getfilesystemencoding() on Windows to UTF-8, 
subprocess seems to use local encoding and give no choice

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-11-17 Thread Steve Dower

Steve Dower added the comment:

The thing about bogus assumptions is that Python should paper over those 
anyway. I can guarantee there's production code out there with the same 
assumptions.

How do we make this work? No idea in the context of the bytes/str filename 
convention differences.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-11-16 Thread Nick Coghlan

Nick Coghlan added the comment:

Thanks. I suspect some of the Windows problems are indeed due to bogus 
assumptions in my draft tests, but at the same time, folks should be able to 
invoke subprocesses with Unicode values without needing extensive knowledge of 
platform specific Unicode handling arcana (whether that's *nix or Windows).

I've added Victor to the nosy list as well, since he'd previously expressed 
interest in implementing a cross-platform "force UTF-8" mode for 3.6 (akin to 
the default behaviour on Mac OS X), and I suspect these proposed test cases 
will be relevant to such a capability.

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-11-16 Thread Steve Dower

Steve Dower added the comment:

Right now all of the tests fail on Windows by default (cp437 for me).

If I change the default IO encoding to utf-8 (hacked into pylifecycle.c, since 
PYTHONIOENCODING is ignored by subprocesses using -E), the four "Misconfigured" 
tests crash at the os.fsencode() call (as "mbcs:strict" cannot encode the 
characters - this may be a real issue, haven't dug into it yet).

Adding more hacks to get past this point brings me back into the ASCII encoding 
performed by the test, and I'm not sure whether that's just an incorrect 
assumption for Windows or not.


Separate issue: if I run "chcp 437" before the tests, the output is garbage. If 
I run "chcp 65001" then it shows the characters in the font correctly. The std 
streams encoding is taken from this value, but it doesn't map back to UTF-8, 
which is probably another issue. If I add a separate check in fileutils.c at 
_Py_device_encoding then I get UTF-8 enabled streams when the console is set 
for cp65001.

However, there are still a number of places that use GetACP() to determine the 
locale and encoding to use, which is incorrect for Unicode-aware programs. In 
particular, this should not happen:

>>> f=open('test.txt', 'w')
>>> f.encoding
'cp1252'

There's no good reason for the default encoding to not be UTF-8 these days, but 
this is a much bigger change. It's probably worth doing for 3.6, but may need 
more discussion...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-11-16 Thread Nick Coghlan

Nick Coghlan added the comment:

In discussing the Windows aspects of the bytes/text boundary handling issues 
with Brett & Steve recently, I realised I hadn't clearly defined what "fixed" 
looked like from my perspective.

The attached test case is an initial attempt at that. It currently fails on a 
UTF-8 Linux system, with the "test_dash_c_unicode" case failing when the 
interpreter is misconfigured with "LANG=C" - the problem there is that when we 
encode from the -c command line argument back to bytes, we don't pass 
"surrogateescape".

I'd be interested in knowing how much of this already passes on a Windows 
system.

There's also a currently missing test case, which is to pass the info to the 
subprocess via stdin - "assert_python_ok()" doesn't currently support that, so 
implementing it will either require a new flag, or direct invocation of 
spawn_python().

--
nosy: +brett.cannon, steve.dower
Added file: http://bugs.python.org/file41054/test_cmd_line_unicode.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-09-22 Thread Nick Coghlan

Nick Coghlan added the comment:

The Fedora RFE at https://bugzilla.redhat.com/show_bug.cgi?id=902094 to provide 
a C.UTF-8 locale by default has been addressed for Fedora 24 (the current 
Fedora Rawhide).

This means the "LANG=C.UTF-8 python3" replacement for the ASCII-centric "LANG=C 
python3" will become more widely available over the course of 2016.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-08-31 Thread Nick Coghlan

Nick Coghlan added the comment:

For historical purposes, also linking the change in issue #19977 to enable 
surrogateescape by default on stdin and stdout when the OS claims the locale 
encoding is ASCII.

--
dependencies: +Use "surrogateescape" error handler for sys.stdin and sys.stdout 
on UNIX for the C locale

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-07-21 Thread Ethan Furman

Changes by Ethan Furman :


--
nosy:  -ethan.furman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-05-13 Thread Drekin

Changes by Drekin :


--
nosy: +Drekin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-05-13 Thread Ethan Furman

Changes by Ethan Furman :


--
nosy: +ethan.furman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-05-11 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-05-11 Thread Nick Coghlan

Nick Coghlan added the comment:

I just went through the still-open issues referenced from here, and recommended 
deferring further consideration of all of the remaining items to 3.6:

* utilities for clearing out surrogates from strings: issue 18814
* treating "wsgistr" as a serialisation format: issue 22264
* defining a formatting mini-language for hex output: issue 22385
* providing a way to change the encoding of an existing stream: issue 15216

I also added two new dependencies to this tracking issue:

* Improved Unicode handling in the Windows console: issue 17620
* Using sys.stdin consistently at the default interactive prompt: issue 1602

--
dependencies: +Python interactive console doesn't use sys.stdin for input, 
windows console doesn't print or input Unicode

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-03-02 Thread Nick Coghlan

Nick Coghlan added the comment:

PEP 461 landed, restoring binary interpolation support: 
https://hg.python.org/cpython/rev/8d802fb6ae32

There are also some relevant around standardising the C.UTF-8 locale currently 
available on some Linux systems:

Fedora RFE: https://bugzilla.redhat.com/show_bug.cgi?id=902094
glibc RFE: https://sourceware.org/bugzilla/show_bug.cgi?id=17318
glibc-alpha discussion: 
https://sourceware.org/ml/libc-alpha/2015-02/msg00247.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2015-02-16 Thread Nick Coghlan

Nick Coghlan added the comment:

Slavek et al - you folks may be interested in this one, as it tracks several 
issues that I consider relevant to the Python 2 -> 3 migration effort.

Redoing the list in a way that should render the strike-throughs for closed 
issues:

* Improved Windows console Unicode support (see
https://pypi.python.org/pypi/win_unicode_console for details)
* Changing the encoding and error handling of an existing stream
(issue 15216)
* Allowing "backslashreplace" to be used on input (issue 22286)
* Adding "codecs.convert_surrogates" (issue 18814)
* Adding "wsgiref.util.dump_wsgistr" and "wsgiref.util.load_wsgistr" (issue 
22264)
* Adding "bytes.hex", "bytearray.hex" and "memoryview.hex" (issue 9951)
* Adding a binary data formatting mini-language (depends on issue 9951, likely 
needs to be escalated to a full PEP for design discussion visibility) (issue 
22385)

--
nosy: +bkabrda, encukou, rkuska

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2014-10-05 Thread Barry A. Warsaw

Changes by Barry A. Warsaw :


--
nosy: +barry

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2014-10-05 Thread Nick Coghlan

Nick Coghlan added the comment:

Assigning to myself, since there's nothing specifically to *do* for this bug, 
it's just to make it easier to track the status of the various other RFEs it 
depends on.

--
assignee:  -> ncoghlan
type:  -> enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2014-10-04 Thread Martin Panter

Changes by Martin Panter :


--
nosy: +vadmium

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2014-10-04 Thread Nick Coghlan

Nick Coghlan added the comment:

PEP 461 binary interpolation implementation issue: 
http://bugs.python.org/issue20284

--
dependencies: +Add codecs.convert_surrogateescape to "clean" surrogate escaped 
strings, Add wsgiref.util.dump_wsgistr & load_wsgistr, Allow backslashreplace 
error handler to be used on input, Define a binary output formatting 
mini-language for *.hex(), Support setting the encoding on a text stream after 
creation, introduce bytes.hex method (also for bytearray and memoryview), patch 
to implement PEP 461 (%-interpolation for bytes)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22555] Tracking issue for adjustments to binary/text boundary handling

2014-10-04 Thread Nick Coghlan

New submission from Nick Coghlan:

See PEP 478 for the PEP level items targeting 3.5: 
http://www.python.org/dev/peps/pep-0478/

This is a tracking issue to help me keep track of some lower level items that 
didn't make the release PEP:

* Improved Windows console Unicode support (see
https://pypi.python.org/pypi/win_unicode_console for details)
* Changing the encoding and error handling of an existing stream
(http://bugs.python.org/issue15216)
* Allowing "backslashreplace" to be used on input 
(http://bugs.python.org/issue22286)
* Adding "codecs.convert_surrogates" (http://bugs.python.org/issue18814)
* Adding "wsgiref.util.dump_wsgistr" and "wsgiref.util.load_wsgistr" 
(http://bugs.python.org/issue22264)
* Adding "bytes.hex", "bytearray.hex" and "memoryview.hex" 
(http://bugs.python.org/issue9951)
* Adding a binary data formatting mini-language (depends on 9951, likely needs 
to be escalated to a full PEP for design discussion visibility) 
(http://bugs.python.org/issue22385)

Going back and updating http://www.python.org/dev/peps/pep-0467/ based on the 
last round of feedback is also on my personal todo list for 3.5.

--
messages: 228536
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: Tracking issue for adjustments to binary/text boundary handling

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com