Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Guido van Rossum
Please no. Let's not add unrelated new functionality in with this
already large change with not entirely understood consequences.

On Thu, Sep 8, 2016 at 1:05 PM, Chris Barker  wrote:
> On Thu, Sep 8, 2016 at 10:35 AM, Random832  wrote:
>>
>>
>> It means that the so-called "bash" on windows 10 is actually a full
>> Ubuntu system (running on, AIUI, a simulation of Linux kernel system
>> calls), which will presumably also have its own python installation and
>> use a UTF-8 locale, rather than one that runs "natively" on win32.
>
>
> yes -- it looks like one could run a "linux" build of python under the whole
> subsystem, which would presumably "look" jsu tlike LInux to Python.
>
>
>>
>> If it's possible for a win32 version of python to call it as a
>> subprocess,
>
>
> But this is what I was referring too -- it may be way to early to know what
> the capabilities or implications are, but I'm hoping that "regular" windows
> programs can interact with the subsystem. So if we're making changes now, it
> would be nice to consider it if we can.
>
>>
>> Incidentally, according to
>>
>> https://github.com/Microsoft/BashOnWindows/issues/2, pipes didn't work
>> at all between WSL processes and Win32 processes until two weeks ago, so
>> it's clear that these features are still evolving.
>
>
> so it may indeed be way to early -- but if they DO work now -- pretty cool!
>
> Thanks,
>
>-CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Chris Barker
On Thu, Sep 8, 2016 at 1:14 PM, Guido van Rossum  wrote:

> Please no. Let's not add unrelated new functionality in with this
> already large change with not entirely understood consequences.
>

Fair enough -- this is clearly a really raw API so far.

-CHB





>
> On Thu, Sep 8, 2016 at 1:05 PM, Chris Barker 
> wrote:
> > On Thu, Sep 8, 2016 at 10:35 AM, Random832 
> wrote:
> >>
> >>
> >> It means that the so-called "bash" on windows 10 is actually a full
> >> Ubuntu system (running on, AIUI, a simulation of Linux kernel system
> >> calls), which will presumably also have its own python installation and
> >> use a UTF-8 locale, rather than one that runs "natively" on win32.
> >
> >
> > yes -- it looks like one could run a "linux" build of python under the
> whole
> > subsystem, which would presumably "look" jsu tlike LInux to Python.
> >
> >
> >>
> >> If it's possible for a win32 version of python to call it as a
> >> subprocess,
> >
> >
> > But this is what I was referring too -- it may be way to early to know
> what
> > the capabilities or implications are, but I'm hoping that "regular"
> windows
> > programs can interact with the subsystem. So if we're making changes
> now, it
> > would be nice to consider it if we can.
> >
> >>
> >> Incidentally, according to
> >>
> >> https://github.com/Microsoft/BashOnWindows/issues/2, pipes didn't work
> >> at all between WSL processes and Win32 processes until two weeks ago, so
> >> it's clear that these features are still evolving.
> >
> >
> > so it may indeed be way to early -- but if they DO work now -- pretty
> cool!
> >
> > Thanks,
> >
> >-CHB
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR(206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115   (206) 526-6317   main reception
> >
> > chris.bar...@noaa.gov
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Chris Barker
On Thu, Sep 8, 2016 at 10:35 AM, Random832  wrote:

>
> It means that the so-called "bash" on windows 10 is actually a full
> Ubuntu system (running on, AIUI, a simulation of Linux kernel system
> calls), which will presumably also have its own python installation and
> use a UTF-8 locale, rather than one that runs "natively" on win32.
>

yes -- it looks like one could run a "linux" build of python under the
whole subsystem, which would presumably "look" jsu tlike LInux to Python.



> If it's possible for a win32 version of python to call it as a
> subprocess,


But this is what I was referring too -- it may be way to early to know what
the capabilities or implications are, but I'm hoping that "regular" windows
programs can interact with the subsystem. So if we're making changes now,
it would be nice to consider it if we can.


> Incidentally, according to
>
https://github.com/Microsoft/BashOnWindows/issues/2, pipes didn't work
> at all between WSL processes and Win32 processes until two weeks ago, so
> it's clear that these features are still evolving.


so it may indeed be way to early -- but if they DO work now -- pretty cool!

Thanks,

   -CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Random832
On Thu, Sep 8, 2016, at 13:10, Guido van Rossum wrote:
> On Thu, Sep 8, 2016 at 9:57 AM, Brett Cannon  wrote:
> > Bash on Windows is just Linux, so it isn't affected by any of this.
> 
> I don't know what that sentence means.

It means that the so-called "bash" on windows 10 is actually a full
Ubuntu system (running on, AIUI, a simulation of Linux kernel system
calls), which will presumably also have its own python installation and
use a UTF-8 locale, rather than one that runs "natively" on win32.

If it's possible for a win32 version of python to call it as a
subprocess, this may be an argument in favor of using UTF-8 - subject to
finding out whether WSL does use UTF-8, whether it supports non-ASCII
arguments from a Win32 CreateProcess at all, whether there's any way to
pass non-UTF-8 arguments to it, etc.

Incidentally, according to
https://github.com/Microsoft/BashOnWindows/issues/2, pipes didn't work
at all between WSL processes and Win32 processes until two weeks ago, so
it's clear that these features are still evolving.

> But anyways, if someone wants
> to try making subprocess work with bytes arguments on Windows work,
> that's just a bugfix, and you're not constrained by how it works on
> previous Python versions (since it doesn't work there at all). It
> might be wise to choose an interpretation that's consistent with other
> uses of command line arguments by Python on Windows though (rather
> than choosing to favor making just bash work the same as it works on
> Linux).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Guido van Rossum
On Thu, Sep 8, 2016 at 9:57 AM, Brett Cannon  wrote:
>
>
> On Thu, 8 Sep 2016 at 09:06 Chris Barker  wrote:
>>
>> On Wed, Sep 7, 2016 at 10:37 AM, Guido van Rossum 
>> wrote:
>>>
>>> And apart from Python, few shell commands that work on
>>> Unix make much sense on Windows,
>>
>>
>> Does the (optional) addition of bash to Windows 10 have any impact on
>> this?
>>
>> It'll be something that Windows developers can't count on their users
>> having for a good while, if ever, but if you can control the deployment
>> environment, then you might. And it would be VERY tempting for
>> "posix-focused" developers that want to run their code on Windows.
>>
>> So it would be nice if the "new" approach worked well with bash on
>> Windows.
>
>
> Bash on Windows is just Linux, so it isn't affected by any of this.

I don't know what that sentence means. But anyways, if someone wants
to try making subprocess work with bytes arguments on Windows work,
that's just a bugfix, and you're not constrained by how it works on
previous Python versions (since it doesn't work there at all). It
might be wise to choose an interpretation that's consistent with other
uses of command line arguments by Python on Windows though (rather
than choosing to favor making just bash work the same as it works on
Linux).


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Brett Cannon
On Thu, 8 Sep 2016 at 09:06 Chris Barker  wrote:

> On Wed, Sep 7, 2016 at 10:37 AM, Guido van Rossum 
> wrote:
>
>> And apart from Python, few shell commands that work on
>> Unix make much sense on Windows,
>
>
> Does the (optional) addition of bash to Windows 10 have any impact on this?
>
> It'll be something that Windows developers can't count on their users
> having for a good while, if ever, but if you can control the deployment
> environment, then you might. And it would be VERY tempting for
> "posix-focused" developers that want to run their code on Windows.
>
> So it would be nice if the "new" approach worked well with bash on Windows.
>

Bash on Windows is just Linux, so it isn't affected by any of this.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-08 Thread Chris Barker
On Wed, Sep 7, 2016 at 10:37 AM, Guido van Rossum  wrote:

> And apart from Python, few shell commands that work on
> Unix make much sense on Windows,


Does the (optional) addition of bash to Windows 10 have any impact on this?

It'll be something that Windows developers can't count on their users
having for a good while, if ever, but if you can control the deployment
environment, then you might. And it would be VERY tempting for
"posix-focused" developers that want to run their code on Windows.

So it would be nice if the "new" approach worked well with bash on Windows.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-07 Thread Nick Coghlan
On 8 September 2016 at 03:37, Guido van Rossum  wrote:
> On Sun, Sep 4, 2016 at 11:58 PM, Nick Coghlan  wrote:
>> While calling system native apps that way will still have many
>> portability challenges, there are also plenty of cases where folks use
>> sys.executable to launch new Python processes in a separate instance
>> of the currently running interpreter, and it would be good if these
>> changes brought cross-platform consistency to the handling of binary
>> arguments here as well.
>
> I checked with Steve and this is not supported anyway -- bytes
> arguments (regardless of the value of shell) fail early with a
> TypeError. That may be a bug but there's no backwards compatibility to
> preserve here. (And apart from Python, few shell commands that work on
> Unix make much sense on Windows, so Im also not particularly worried
> about that particular example being non-portable -- it doesn't
> represent a realistic concern.)

Cool, I suspected "That already doesn't work, so you just have to use
strings for cross-platform compatibility in those cases" would be the
answer, and I think that's a sensible way to go. Even on *nix passing
bytes arguments to subprocess is unusual, since anyone with Python 2
based habits will omit the "b" prefix from literals, and anything
coming from the command line, environment, or other user input is
supplied as text by default.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-07 Thread Steve Dower

On 07Sep2016 1037, Guido van Rossum wrote:

I'm hijacking this thread to provisionally accept PEP 529. (I'll also
do this for PEP 528, in its own thread.)

I've talked things over with Steve and Victor and we're going to do an
experiment (as now written up in the PEP:
https://www.python.org/dev/peps/pep-0529/#beta-experiment) to tease
out any issues with this change during the beta. If serious problems
crop up we may have to roll back the changes and reject the PEP -- we
won't get another chance at getting this right. (That would also mean
that using the binary filesystem APIs will remain deprecated and will
eventually be disallowed; as long as the PEP remains accepted they are
undeprecated.)

Congrats Steve! Thanks for the massive amount of work on the
implementation and the thinking that went into the design. Thanks
everyone else for their feedback.

--Guido


Thanks! I've updated the status. Now the process of bartering for code 
reviews begins :)


Patches are at:
  PEP 528: http://bugs.python.org/issue1602
  PEP 529: http://bugs.python.org/issue27781

Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-07 Thread Guido van Rossum
I'm hijacking this thread to provisionally accept PEP 529. (I'll also
do this for PEP 528, in its own thread.)

I've talked things over with Steve and Victor and we're going to do an
experiment (as now written up in the PEP:
https://www.python.org/dev/peps/pep-0529/#beta-experiment) to tease
out any issues with this change during the beta. If serious problems
crop up we may have to roll back the changes and reject the PEP -- we
won't get another chance at getting this right. (That would also mean
that using the binary filesystem APIs will remain deprecated and will
eventually be disallowed; as long as the PEP remains accepted they are
undeprecated.)

Congrats Steve! Thanks for the massive amount of work on the
implementation and the thinking that went into the design. Thanks
everyone else for their feedback.

--Guido

PS. I have one small inline response to Nick below.

On Sun, Sep 4, 2016 at 11:58 PM, Nick Coghlan  wrote:
> On 5 September 2016 at 15:59, Steve Dower  wrote:
>> +continue to default to ``locale.getpreferredencoding()`` (for text files) or
>> +plain bytes (for binary files). This only affects the encoding used when 
>> users
>> +pass a bytes object to Python where it is then passed to the operating 
>> system as
>> +a path name.
>
> For the three non-filesystem cases:
>
> I checked the situation for os.environb, and that's already
> unavailable on Windows (since os.supports_bytes_environ is False
> there), while sys.argv is apparently already handled correctly (i.e.
> always using the *W APIs).
>
> That means my only open question would be the handling of subprocess
> module calls (both with and without shell=True), since that currently
> works with binary arguments on *nix:
>
 subprocess.call([b"python", b"-c", "print('ℙƴ☂ℌøἤ')".encode("utf-8")])
> ℙƴ☂ℌøἤ
> 0
 subprocess.call(b"python -c '%s'" % 'print("ℙƴ☂ℌøἤ")'.encode("utf-8"), 
 shell=True)
> ℙƴ☂ℌøἤ
> 0
>
> While calling system native apps that way will still have many
> portability challenges, there are also plenty of cases where folks use
> sys.executable to launch new Python processes in a separate instance
> of the currently running interpreter, and it would be good if these
> changes brought cross-platform consistency to the handling of binary
> arguments here as well.

I checked with Steve and this is not supported anyway -- bytes
arguments (regardless of the value of shell) fail early with a
TypeError. That may be a bug but there's no backwards compatibility to
preserve here. (And apart from Python, few shell commands that work on
Unix make much sense on Windows, so Im also not particularly worried
about that particular example being non-portable -- it doesn't
represent a realistic concern.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-05 Thread Nick Coghlan
On 5 September 2016 at 15:59, Steve Dower  wrote:
> +continue to default to ``locale.getpreferredencoding()`` (for text files) or
> +plain bytes (for binary files). This only affects the encoding used when 
> users
> +pass a bytes object to Python where it is then passed to the operating 
> system as
> +a path name.

For the three non-filesystem cases:

I checked the situation for os.environb, and that's already
unavailable on Windows (since os.supports_bytes_environ is False
there), while sys.argv is apparently already handled correctly (i.e.
always using the *W APIs).

That means my only open question would be the handling of subprocess
module calls (both with and without shell=True), since that currently
works with binary arguments on *nix:

>>> subprocess.call([b"python", b"-c", "print('ℙƴ☂ℌøἤ')".encode("utf-8")])
ℙƴ☂ℌøἤ
0
>>> subprocess.call(b"python -c '%s'" % 'print("ℙƴ☂ℌøἤ")'.encode("utf-8"), 
>>> shell=True)
ℙƴ☂ℌøἤ
0

While calling system native apps that way will still have many
portability challenges, there are also plenty of cases where folks use
sys.executable to launch new Python processes in a separate instance
of the currently running interpreter, and it would be good if these
changes brought cross-platform consistency to the handling of binary
arguments here as well.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-05 Thread Steve Dower
I posted an update to PEP 529 at 
https://github.com/python/peps/blob/master/pep-0529.txt and a diff below. The 
update includes more detail on the affected code within CPython - including a 
number of references to broken code that would be resolved with the change - 
and more details about the necessary changes.

As with PEP 528, I don't think it's possible to predict the impact better than 
I already have, and the beta period will be essential to determine whether this 
change is completely unworkable. I am fully prepared to back out the change if 
necessary prior to RC.

Cheers,
Steve

---


@@ -16,7 +16,8 @@
 operating system, often via C Runtime functions. However, these have been long
 discouraged in favor of the UTF-16 APIs. Within the operating system, all text
 is represented as UTF-16, and the ANSI APIs perform encoding and decoding using
-the active code page.
+the active code page. See `Naming Files, Paths, and Namespaces`_ for
+more details.
 
 This PEP proposes changing the default filesystem encoding on Windows to utf-8,
 and changing all filesystem functions to use the Unicode APIs for filesystem
@@ -27,10 +28,10 @@
 characters outside of the user's active code page.
 
 Notably, this does not impact the encoding of the contents of files. These will
-continue to default to locale.getpreferredencoding (for text files) or plain
-bytes (for binary files). This only affects the encoding used when users pass a
-bytes object to Python where it is then passed to the operating system as a 
path
-name.
+continue to default to ``locale.getpreferredencoding()`` (for text files) or
+plain bytes (for binary files). This only affects the encoding used when users
+pass a bytes object to Python where it is then passed to the operating system 
as
+a path name.
 
 Background
 ==
@@ -44,9 +45,10 @@
 
 When paths are passed between the filesystem and the application, they are
 either passed through as a bytes blob or converted to/from str using
-``os.fsencode()`` or ``sys.getfilesystemencoding()``. The result of encoding a
-string with ``sys.getfilesystemencoding()`` is a blob of bytes in the native
-format for the default file system.
+``os.fsencode()`` and ``os.fsdecode()`` or explicit encoding using
+``sys.getfilesystemencoding()``. The result of encoding a string with
+``sys.getfilesystemencoding()`` is a blob of bytes in the native format for the
+default file system.
 
 On Windows, the native format for the filesystem is utf-16-le. The recommended
 platform APIs for accessing the filesystem all accept and return text encoded 
in
@@ -83,11 +85,11 @@
 canonical representation. Even if the encoding is "incorrect" by some standard,
 the file system will still map the bytes back to the file. Making use of this
 avoids the cost of decoding and reencoding, such that (theoretically, and only
-on POSIX), code such as this may be faster because of the use of `b'.'` 
compared
-to using `'.'`::
+on POSIX), code such as this may be faster because of the use of ``b'.'``
+compared to using ``'.'``::
 
 >>> for f in os.listdir(b'.'):
-... os.stat(f)
+... os.stat(f)
 ...
 
 As a result, POSIX-focused library authors prefer to use bytes to represent
@@ -105,32 +107,31 @@
 Currently the default filesystem encoding is 'mbcs', which is a meta-encoder
 that uses the active code page. However, when bytes are passed to the 
filesystem
 they go through the \*A APIs and the operating system handles encoding. In this
-case, paths are always encoded using the equivalent of 'mbcs:replace' - we have
-no ability to change this (though there is a user/machine configuration option
-to change the encoding from CP_ACP to CP_OEM, so it won't necessarily always
-match mbcs...)
+case, paths are always encoded using the equivalent of 'mbcs:replace' with no
+opportunity for Python to override or change this.
 
 This proposal would remove all use of the \*A APIs and only ever call the \*W
-APIs. When Windows returns paths to Python as str, they will be decoded from
+APIs. When Windows returns paths to Python as ``str``, they will be decoded 
from
 utf-16-le and returned as text (in whatever the minimal representation is). 
When
-Windows returns paths to Python as bytes, they will be decoded from utf-16-le 
to
-utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it is
-possible to have invalid surrogates in filenames). Equally, when paths are
-provided as bytes, they are decoded from utf-8 into utf-16-le and passed to the
-\*W APIs.
+Python code requests paths as ``bytes``, the paths will be transcoded from
+utf-16-le into utf-8 using surrogatepass (Windows does not validate surrogate
+pairs, so it is possible to have invalid surrogates in filenames). Equally, 
when
+paths are provided as ``bytes``, they are trasncoded from utf-8 into utf-16-le
+and passed to the \*W APIs.
 
-The use of utf-8 will not be configurable, with the possible exception of a
-"legacy mode" environment variable or X-flag.

Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-03 Thread Adam Bartoš
Nick Coghlan (ncoghlan at gmail.com) on Sat Sep 3 12:27:44 EDT 2016 wrote:

> After also reading the Windows console encoding PEP, I realised
> there's a couple of missing discussions here regarding the impacts on
> sys.argv, os.environ, and os.environb.
>
> The reason that's relevant is that "sys.getfilesystemencoding" is a
> bit of a misnomer, as it's also used to determine the assumed encoding
> of command line arguments and environment variables.
>
>
Regarding sys.argv, AFAIK Unicode arguments work well on Python 3. Even
non-BMP characters are transferred correctly.


Adam Bartoš
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-03 Thread Nick Coghlan
On 4 September 2016 at 00:49, Nick Coghlan  wrote:
> On 2 September 2016 at 08:31, Steve Dower  wrote:
>> This proposal would remove all use of the *A APIs and only ever call the *W
>> APIs. When Windows returns paths to Python as str, they will be decoded from
>> utf-16-le and returned as text (in whatever the minimal representation is).
>> When
>> Windows returns paths to Python as bytes, they will be decoded from
>> utf-16-le to
>> utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it
>> is
>> possible to have invalid surrogates in filenames). Equally, when paths are
>> provided as bytes, they are decoded from utf-8 into utf-16-le and passed to
>> the
>> *W APIs.
>
> The overall proposal looks good to me, there's just a terminology
> glitch here: utf-8 <-> utf-16-le should either be described as
> transcoding, or else as decoding and then re-encoding. As they're both
> text codecs, there's no "decoding" operation that switches between
> them.

After also reading the Windows console encoding PEP, I realised
there's a couple of missing discussions here regarding the impacts on
sys.argv, os.environ, and os.environb.

The reason that's relevant is that "sys.getfilesystemencoding" is a
bit of a misnomer, as it's also used to determine the assumed encoding
of command line arguments and environment variables.

With the PEP currently stating that all use of the "*A" Windows APIs
will be removed, I'm guessing these will just start working as
expected, but it should be convered explicitly.

In addition, if the subprocess module is going to be excluded from
these changes, that should be called out explicitly (Keeping in mind
that on *nix, the only subprocess pipe configurations that are
straightforward to set up in Python 3 are raw binary mode and
universal newlines mode, with the latter implicitly treating the pipes
as UTF-8 text)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-03 Thread Nick Coghlan
On 2 September 2016 at 08:31, Steve Dower  wrote:
> This proposal would remove all use of the *A APIs and only ever call the *W
> APIs. When Windows returns paths to Python as str, they will be decoded from
> utf-16-le and returned as text (in whatever the minimal representation is).
> When
> Windows returns paths to Python as bytes, they will be decoded from
> utf-16-le to
> utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it
> is
> possible to have invalid surrogates in filenames). Equally, when paths are
> provided as bytes, they are decoded from utf-8 into utf-16-le and passed to
> the
> *W APIs.

The overall proposal looks good to me, there's just a terminology
glitch here: utf-8 <-> utf-16-le should either be described as
transcoding, or else as decoding and then re-encoding. As they're both
text codecs, there's no "decoding" operation that switches between
them.

As far as the timing of this particular change goes, I think you make
a good case that all of the cases that will see a behaviour change
with this PEP have already been receiving deprecation warnings since
3.3, which would make it acceptable to change the default behaviour in
3.6.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-02 Thread Paul Moore
On 1 September 2016 at 23:31, Steve Dower  wrote:
[...]
> As a result, POSIX-focused library authors prefer to use bytes to represent
> paths.

A minor point, but in my experience, a lot of POSIX-focused authors
are happy to move to a better text/bytes separation, so I'd soften
this to "some POSIX-focused library authors...".

Other than that minor point, this looks great - +1 from me.
Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

2016-09-01 Thread Steve Dower
I'm about to be offline for a few days, so I wanted to get my current 
draft PEPs out for people can read and review.


I don't believe there is a lot of change as a result of either PEP, but 
the impact of what change there is needs to be weighed against the benefits.


We've already had some thorough discussion on this one and failed to 
reach agreement on whether we can make this change in 3.6 or if it needs 
a deprecation cycle that is more visible than the one we started in 3.3. 
In the latter case, we need to determine how visible that should be 
(i.e. warnings visible by default, visible for non-Windows platforms, 
value-dependent warnings/errors, etc.). IMHO, the argument about having 
the change be on-by-default or off-by-default is irrelevant until we 
decide on the deprecation issue, at which point it is obvious what the 
default should be.


See https://bugs.python.org/issue27781 for the current proposed patch. I 
do need to update it in order to merge against default it seems (work 
for my upcoming flight).


Cheers,
Steve


---
https://github.com/python/peps/blob/master/pep-0529.txt
---

PEP: 529
Title: Change Windows filesystem encoding to UTF-8
Version: $Revision$
Last-Modified: $Date$
Author: Steve Dower 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Aug-2016
Post-History: 01-Sep-2016

Abstract


Historically, Python uses the ANSI APIs for interacting with the Windows
operating system, often via C Runtime functions. However, these have 
been long
discouraged in favor of the UTF-16 APIs. Within the operating system, 
all text
is represented as UTF-16, and the ANSI APIs perform encoding and 
decoding using

the active code page.

This PEP proposes changing the default filesystem encoding on Windows to 
utf-8,

and changing all filesystem functions to use the Unicode APIs for filesystem
paths. This will not affect code that uses strings to represent paths, 
however

those that use bytes for paths will now be able to correctly round-trip all
valid paths in Windows filesystems. Currently, the conversions between 
Unicode

(in the OS) and bytes (in Python) were lossy and would fail to round-trip
characters outside of the user's active code page.

Notably, this does not impact the encoding of the contents of files. 
These will

continue to default to locale.getpreferredencoding (for text files) or plain
bytes (for binary files). This only affects the encoding used when users 
pass a
bytes object to Python where it is then passed to the operating system 
as a path

name.

Background
==

File system paths are almost universally represented as text with an 
encoding
determined by the file system. In Python, we expose these paths via a 
number of
interfaces, such as the ``os`` and ``io`` modules. Paths may be passed 
either

direction across these interfaces, that is, from the filesystem to the
application (for example, ``os.listdir()``), or from the application to the
filesystem (for example, ``os.unlink()``).

When paths are passed between the filesystem and the application, they are
either passed through as a bytes blob or converted to/from str using
``os.fsencode()`` or ``sys.getfilesystemencoding()``. The result of 
encoding a

string with ``sys.getfilesystemencoding()`` is a blob of bytes in the native
format for the default file system.

On Windows, the native format for the filesystem is utf-16-le. The 
recommended
platform APIs for accessing the filesystem all accept and return text 
encoded in

this format. However, prior to Windows NT (and possibly further back), the
native format was a configurable machine option and a separate set of APIs
existed to accept this format. The option (the "active code page") and these
APIs (the "*A functions") still exist in recent versions of Windows for
backwards compatibility, though new functionality often only has a 
utf-16-le API

(the "*W functions").

In Python, str is recommended because it can correctly round-trip all 
characters
used in paths (on POSIX with surrogateescape handling; on Windows 
because str

maps to the native representation). On Windows bytes cannot round-trip all
characters used in paths, as Python internally uses the *A functions and 
hence
the encoding is "whatever the active code page is". Since the active 
code page
cannot represent all Unicode characters, the conversion of a path into 
bytes can

lose information without warning or any available indication.

As a demonstration of this::
>>> open('test\uAB00.txt', 'wb').close()
>>> import glob
>>> glob.glob('test*')
['test\uab00.txt']
>>> glob.glob(b'test*')
[b'test?.txt']

The Unicode character in the second call to glob has been replaced by a '?',
which means passing the path back into the filesystem will result in a
``FileNotFoundError``. The same results may be observed with 
``os.listdir()`` or

any function that matches the return type to the parameter type.

While one