[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-03-27 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +757

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-03-27 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests:  -15

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-14 Thread Eryk Sun

Eryk Sun added the comment:

> it should be replaced with sys.getfilesystemencodeerrors() 
> to support UTF-8 Strict mode.

I did that in the patch for issue 28188. The focus of the patch is to add bytes 
support on Windows for os.putenv and os.environb, but I also tried to maximize 
consistency (at least parallel structure) between the POSIX and Windows 
implementations.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-13 Thread STINNER Victor

STINNER Victor added the comment:

Oh, I just noticed that os.environ uses the hardcoded error handler 
"surrogateescape": it should be replaced with sys.getfilesystemencodeerrors() 
to support UTF-8 Strict mode.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread INADA Naoki

INADA Naoki added the comment:

How about locale.getpreferredencoding() returns 'utf-8' in utf8 mode?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread STINNER Victor

STINNER Victor added the comment:

encodings.py: enhancement version of pep540_cli.py, add locale and filesystem 
encoding. Script to test the implementation of the PEP 540 (and PEP 538).

--
Added file: http://bugs.python.org/file46274/encodings.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread STINNER Victor

STINNER Victor added the comment:

Patch version 4:

* Handle PYTHONLEGACYWINDOWSFSENCODING: this env var now disables the UTF-8 
mode and has the priority over -X utf8 and PYTHONUTF8
* Add an unit test on PYTHONUTF8 env var and -E cmdline option
* Add an unit test on the POSIX locale
* Fix initstdio() to handle correctly empty PYTHONIOENCODING: this bug affects 
Python 3.6 as well and is not directly related to the PEP 540
* Fix to handle correctly PYTHONUTF8 set to an empty string (ignore it)
* Skip an unit test in test_utf8mode which failed with the POSIX locale

Note: This patch still has the sys.argv encoding bug with locale encodings 
different than ASCII and UTF-8.

--
Added file: http://bugs.python.org/file46270/pep540-4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread STINNER Victor

STINNER Victor added the comment:

Hum, test_utf8mode lacks an unit test on the -E command line option:
PYTHONUTF8 should be ignored if -E is used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Chi Hsuan Yen

Changes by Chi Hsuan Yen :


--
nosy: +Chi Hsuan Yen

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-12 Thread STINNER Victor

STINNER Victor added the comment:

> Can -X utf8 option be processed before Py_Main()?

I'm trying to implement that, but it's hard to factorize the code. I will 
probably have to duplicate the code handling -E, -X utf8, PYTHONMALLOC and 
PYTHONUTF8 for wchar_t* (UCS4 or UTF-16) and char* (bytes).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki

INADA Naoki added the comment:

> Hum, pep540-3.patch doesn't work if the locale encoding is different than 
> ASCII and UTF-8. argv must be reencoded:

I want to skip reencoding.
On UTF-8 mode, arbitrary bytes in cmdline (e.g. broken filename passed by xarg) 
should be able to roundtrip by UTF-8/surrogateescape.

I don't trust wcstombs/mbstowcs.  It may not guarantee round tripping of 
arbitrary bytes.

Can -X utf8 option be processed before Py_Main()?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

I only tested the the PEP 540 implementation on Linux.

The PEP and its implementation should adjusted for Windows, especially 
Windows-only env vars like PYTHONLEGACYWINDOWSFSENCODING.

Changes are maybe also needed for Mac OS X and Android, which always use UTF-8. 
Currently, the locale encoding is still used on these platforms (ex: by 
open()). Is it possible to a locale encoding different than UTF-8 on Android 
for example?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

Hum, pep540-3.patch doesn't work if the locale encoding is different than ASCII 
and UTF-8. argv must be reencoded:

$ LC_ALL=fr_FR ./python -X utf8 -c 'import sys; print(ascii(sys.argv))' $(echo 
-ne "\xff")
['-c', '\xff']

The result should not depend on the locale, it should be the same than:

$ LC_ALL=fr_FR.utf8 ./python -X utf8 -c 'import sys; print(ascii(sys.argv))' 
$(echo -ne "\xff")
['-c', '\udcff']

$ LC_ALL=C ./python -X utf8 -c 'import sys; print(ascii(sys.argv))' $(echo -ne 
"\xff")
['-c', '\udcff']

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

Oops, I introduced an obvious bug in my latest refactoring. It's now fixed in 
the patch version 3: pep540-3.patch.

--
Added file: http://bugs.python.org/file46263/pep540-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

pep540-2.patch: Patch version 2, updated to the latest version of the PEP 540. 
It has no more FIXME/TODO and has more unit tests. The main change is that the 
strict mode doesn't use strict anymore for OS data, but keeps surrogateescape. 
See the PEP for the rationale (especially the "Use the strict error handler for 
operating system data" alternative).

--
Added file: http://bugs.python.org/file46262/pep540-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki

Changes by INADA Naoki :


--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

Examples with pep540_cli.py.

Python 3.5:

$ python3 pep540_cli.py 
sys.argv: ['pep540_cli.py']
stdin: UTF-8/strict
stdout: UTF-8/strict
stderr: UTF-8/backslashreplace
open(): UTF-8/strict

$ LC_ALL=C python3 pep540_cli.py 
sys.argv: ['pep540_cli.py']
stdin: ANSI_X3.4-1968/surrogateescape
stdout: ANSI_X3.4-1968/surrogateescape
stderr: ANSI_X3.4-1968/backslashreplace
open(): ANSI_X3.4-1968/strict


Patched Python 3.7:


$ ./python pep540_cli.py 
UTF-8 mode: 0
sys.argv: ['pep540_cli.py']
stdin: UTF-8/strict
stdout: UTF-8/strict
stderr: UTF-8/backslashreplace
open(): UTF-8/strict

$ LC_ALL=C ./python pep540_cli.py 
UTF-8 mode: 1
sys.argv: ['pep540_cli.py']
stdin: utf-8/surrogateescape
stdout: utf-8/surrogateescape
stderr: utf-8/backslashreplace
open(): utf-8/surrogateescape

$ ./python -X utf8 pep540_cli.py 
UTF-8 mode: 1
sys.argv: ['pep540_cli.py']
stdin: utf-8/surrogateescape
stdout: utf-8/surrogateescape
stderr: utf-8/backslashreplace
open(): utf-8/surrogateescape

$ ./python -X utf8=strict pep540_cli.py 
UTF-8 mode: 2
sys.argv: ['pep540_cli.py']
stdin: utf-8/strict
stdout: utf-8/strict
stderr: utf-8/backslashreplace
open(): utf-8/strict

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

pep540.patch: first draft

Changes:

* Add sys.flags.utf8mode
* Add -X utf8 command line option
* Add PYTHONUTF8 environment variable
* sys.stdin, sys.stdout and sys.stderr encoding and errors are modified in 
UTF-8 mode
* open() default encoding and errors is modified in the UTF-8 mode
* Add Lib/test/test_utf8mode.py
* Skip a few tests relying on the locale encoding if the UTF-8 mode is enabled
* Document changes

Allowed options:

* Disable UTF-8 mode: -X utf8=0 or PYTHONUTF8=0
* Enable UTF-8 mode: -X utf8=1 or PYTHONUTF8=1
* Enable UTf-8 Strict mode: -X utf8=strict or PYTHONUTF8=strict
* Other -X utf8 and PYTHONUTF8 values cause a fatal error

Prioririties (highest to lowest):

* open() encoding and errors arguments
* PYTHONIOENCODING
* UTF-8 mode
* os.device_encoding()
* locale encoding

TODO:

* re-encode sys.argv from the local encoding to UTF-8 in Py_Main() when the 
UTF-8 mode is enabled
* support strict mode in Py_DecodeLocale() and Py_EncodeLocale()

--
keywords: +patch
Added file: http://bugs.python.org/file46258/pep540.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29240] Implementation of the PEP 540: Add a new UTF-8 mode

2017-01-11 Thread STINNER Victor

New submission from STINNER Victor:

This issue tracks the implementation of the PEP 540.

Attached pep540_cli.py script can be used to play with it.

--
components: Interpreter Core, Library (Lib), Unicode
files: pep540_cli.py
messages: 285214
nosy: ezio.melotti, haypo
priority: normal
pull_requests: 15
severity: normal
status: open
title: Implementation of the PEP 540: Add a new UTF-8 mode
type: enhancement
versions: Python 3.7
Added file: http://bugs.python.org/file46257/pep540_cli.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com