Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan
On 23 May 2017 at 03:38, Steve Dower  wrote:
> Okay, I think I get the problem now. We expect backends to let child
> subprocesses just spit out whatever *they* want onto the same stdout/stderr.
>
> I'm really not a fan of forcing front ends to clean up that mess, and so I'd
> still suggest that the backend "tool" be a script to launch the actual tool
> and do the conversion to UTF-8.

One of the key premises of PEP 517 is that there will be relatively
few front ends (pip, possibly easy_install, ???), but a relatively
large number of backends (one per build system - at least
distutils/setuptools, distutils2, flit, encons, likely eventually
meson, waf, and yotta, and potentially even C/C++ build systems like
autotools, CMake, etc).

So it makes sense to put the implementation burden for important
aspects of the UX on the part that PyPA has the most influence over
(the front-end), rather than considering it reasonable for front-end
developers to point fingers and say "That UX failure in the tool we
provide isn't *our* fault, it's the fault of the build backend
developers for not complying with the interoperability specification
properly").

Once we make that core assumption about where the responsibility for
the end user experience resides, then the absolutely *minimum*
behavioural requirements that can be placed on build backends are:

- respect the locale encoding
- emit informational messages on stdout
- emit error messages on stderr

What we can then also do is to recommend that *front-ends* do the
following when invoking their build backend CLI shims:

1. Implement the C locale -> UTF-8 based locale coercion defined in
PEP 538 when launching the subprocess
2. Implement a similar coercion for Windows, where cp1252 being active
in the parent process prompts a call to "'chcp cp65001'" inside the
subprocess before the build backend itself actually starts running

That leaves build backend authors with the freedom to assume that they
*don't* need to worry about stream encoding issues, since giving them
access to properly configured streams is the front end's
responsibility.

> Perhaps the middle ground is to specify encoding='utf-8', errors='anything
> but strict' for front-ends, and well-behaved backends should do the work to
> transcode when it is known to be necessary for the tools they run. (i.e.
> frontends do not crash, backends have a simple rule for avoiding loss of
> data).

In PEP 517's architecture, the front-end developers are also
responsible for the CLI that's running inside the backend subprocess.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver
On Mon, May 22, 2017, at 11:36 PM, Steve Dower wrote:
> IMHO, #2 is definitely the right way to go. Yes, the platform specific 
> code now has to worry about the encoding, but... the encoding is 
> platform specific? So... that seems exactly right? :) Maybe I'm still 
> missing something here, but I'm totally happy to leave it to Thomas to 
> decide (which I think he has, but I haven't gotten to looking at that PR 
> yet).

I think I broadly agree with this as well. My reservation is that the
build backend might be running a subprocess which produces output in an
*unknown* encoding, especially if it allows the package author or the
end user to configure a command to run. If it doesn't know the encoding,
I'd rather get the raw bytes from the subprocess in the log (e.g. dumped
to a file), rather than attempting to transcode them to UTF-8 - the
conversion risks losing information, and even if it doesn't, it makes it
harder to work out what was really meant.

I feel like we're spending a lot of energy on a point that's not really
central to the PEP, though. I think we've established that there's a
potential for bugs and mojibake whatever we put in the spec. So I'd like
to put something relatively simple and move on. I still stand by my PR,
which amounts to "backends try to make it UTF-8, frontends don't crash
if it isn't". I might be persuaded to add a recommendation that
frontends dump the bytes to a file if they're not UTF-8, so the user can
pull it apart if necessary.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Steve Dower

On 22May2017 1253, Paul Moore wrote:

It seems to me there are 2 schools of thought:

1. There are likely to be fewer front ends than back ends, and so the
front end(s) (basically, pip) should deal with the problem. Also,
backends are more likely to be written by developers who are looking
at very specific scenarios, and asking them to handle all the
complexities of robust multilingual coding is raising the bar on
writing a backend too high.

2. The backend is where the problem lies, and so the backend should
address the issue. Furthermore, a well-established principle in
dealing with encodings is to convert to strings right at the boundary
of the application, and in this case the backend is the only code that
has access to that boundary.

(I tend towards (2), but I honestly can't say to what extent that's
because it makes it "someone else's problem" for me ;-))


I also tend towards 2, and I assume I am one of the more likely people 
to write the part that invokes Microsoft's cl.exe/link.exe :)


Is the front end going to be directly invoking those tools? I would 
assume not, otherwise it won't be cross platform.


Since the shim belongs to the front end, I've essentially been ignoring 
it. The shim can invoke another part of the build tool, but that is not 
going to be cl.exe/link.exe either.


At some point there will be a script that runs the tools directly. I 
have been referring to that as the backend, and it is the part that 
should handle capturing and transcoding the output. Everything from 
there can be utf8:replace to prevent crashing, but we can't say "the 
frontend can handle all encodings", and shouldn't say "the frontend will 
only use bad encodings".


IMHO, #2 is definitely the right way to go. Yes, the platform specific 
code now has to worry about the encoding, but... the encoding is 
platform specific? So... that seems exactly right? :) Maybe I'm still 
missing something here, but I'm totally happy to leave it to Thomas to 
decide (which I think he has, but I haven't gotten to looking at that PR 
yet).


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] setuptools 33.1.1 - 35 issue with source package install from pypi

2017-05-22 Thread Matt Joyce
https://stackoverflow.com/questions/44120045/something-is-breaking-my-package-deployment

has full info on what i am seeing, but it suffices to say:

I use setuptools in my setup.py
I iterate a requirements.txt file to satisfy dependencies.
I publish as sdist to pypi

my package used to work fine as it currently is on pypi.  however as of
late i've been seeing the python setup.py invocation ( from pip install
python-symphony ) fail with a couple different failure scenarios.

one is it attempting to install what looks like a binary wheel ( that
doesn't exist on pypi or anywhere ) that ends up being a bunch of blank
stubs.   the other is a situation where the package looks 100% correct in
site-packages, but none of the module components are registered when you
try to use the imported module.

pip install --isolated --no-cache-dir python-symphony seems to avoid the
issue.  but it also tends to do weird stuff like break bpython by messing
up the pygments install.

curious if anyone was familiar with some sort of bug that i am stumbling
into here.

-Matt
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore
On 22 May 2017 at 18:38, Steve Dower  wrote:
> Okay, I think I get the problem now. We expect backends to let child
> subprocesses just spit out whatever *they* want onto the same stdout/stderr.

s/expect/allow/

The paranoid in me suspects "expect" is also true, though :-)

> I'm really not a fan of forcing front ends to clean up that mess, and so I'd
> still suggest that the backend "tool" be a script to launch the actual tool
> and do the conversion to UTF-8.

What you're referring to as the backend "tool" being a script, is what
the PEP refers to as a "shim" (as Nick pointed out to me) and is
considered part of the front end. The back end is a set of Python APIs
which are called by the front end (in any real life front end, via the
front end's shim script).

> Perhaps the middle ground is to specify encoding='utf-8', errors='anything
> but strict' for front-ends, and well-behaved backends should do the work to
> transcode when it is known to be necessary for the tools they run. (i.e.
> frontends do not crash, backends have a simple rule for avoiding loss of
> data).

For front ends, "never crash" is essential. But "produce as readable
as possible data" is also a high priority. Consider for example a
Russian user with a series of directories named in Russian. If the
tools write an error using his local 8-bit encoding, and the front end
assumes UTF-8, then all of the high-bit characters in his directory
names would be replaced. Deciphering an error message like "File
???/?/?.c: unexpected EOF" is problematic... :-(

The model assumes that most front-ends would call the backend via a
subprocess "shim" that was maintained by the front end project. But
the expectation here seems to be that the backend is allowed to write
directly to the stdio streams of its process (or at least, to let the
tools it calls do so). So the shim *cannot* control the encoding of
the data received by the frontend, and so the encoding has to be
agreed between backend and frontend. The basic question is how the
responsibility for dealing with data in an uncertain encoding is
allocated.

It seems to me there are 2 schools of thought:

1. There are likely to be fewer front ends than back ends, and so the
front end(s) (basically, pip) should deal with the problem. Also,
backends are more likely to be written by developers who are looking
at very specific scenarios, and asking them to handle all the
complexities of robust multilingual coding is raising the bar on
writing a backend too high.

2. The backend is where the problem lies, and so the backend should
address the issue. Furthermore, a well-established principle in
dealing with encodings is to convert to strings right at the boundary
of the application, and in this case the backend is the only code that
has access to that boundary.

(I tend towards (2), but I honestly can't say to what extent that's
because it makes it "someone else's problem" for me ;-))

As you say, the middle ground here is that front ends must never
crash, and back ends should (but aren't required to) produce output in
a specified encoding (I still prefer the locale encoding as that has
the best chance of avoiding the / issue). That's more or less
what pip has to deal with now (and not that far off (1)), and my
current attempt to address that situation is at
https://github.com/pypa/pip/pull/4486 for what it's worth.

A couple of final thoughts. I would expect that testing the handling
of encodings is likely to be an important issue (at least, I expect
there'll be bugs, and adding tests to make sure they get properly
fixed will be important). Handling tool output encoding in the backend
is likely to involve relatively low level interface functions, where
the inputs and outputs can be relatively easily mocked. So I would
expect backend unit testing of encoding handling would be relatively
straightforward. Conversely, testing front end handling of encoding
issues is very tricky - it's necessary to set up system state to
persuade the build tools to produce the data you want to test against
(it feels like integration testing rather than unit testing). Also,
fixing encoding issues in the backend decouples the fix from pip's
release cycle, which is probably a good thing (unless the backend is
not well maintained, but that's an issue in itself).

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore
On 22 May 2017 at 15:23, Nick Coghlan  wrote:
> No, that's discussed here:
> https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals
>
> Even though PEP 517 defines a Python API for build backends to
> implement, it still expects installation tools to wrap a subprocess
> call around the backend invocation.

OK, but is it not acceptable for the child cmdline process (owned by
pip) to capture the backend implementation's stdout using reassignment
of sys.stdout? I assume, from your response, that it's *not*
acceptable to do that - but that needs to be documented somewhere.
Specifically, that the child cmdline is not allowed to do something
like:

out = io.StringIO
sys.stdout = out
build_backend.hook()
print(out.getvalue(), encoding="UTF-8")

(Which would otherwise be a very simple way to get guaranteed UTF-8 as
the encoding across the process boundary - but it does so by imposing
basically the rules I stated on the backend).

> That said, the whole "The build backend still runs in a subprocess"
> aspect should probably be separated out into its own section
> "Isolating build backends from frontend process state", rather than
> solely being covered in the "Comparison to PEP 516?" section, as it's
> a key aspect of the design - we expect each installation tool to
> provide its own CLI shim for calling build backends, rather than
> requiring all installation tools to use the same one.

Strong +1. And that section needs to be very clear on issues like
this, covering what the shim is allowed to do. As the point of the
shim is to protect the backend from frontend state, I'm OK with the
general principle that the shim must do "as little as possible" before
calling the hook - but "reset sys.stdout to protect against encoding
errors" could easily be seen as within the realm of acceptable
behaviour (as it stops hooks writing arbitrary Unicode to a standard
output that the shim knows is limited).

I'm happy enough with the idea that pip won't do anything silly in its
CLI shim, but we don't want to get into the "implementation as the
standard" situation where a backend is allowed to do anything that
pip's shim can cope with...

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Last call for PEP 516 champions

2017-05-22 Thread Nick Coghlan
Hi folks,

The restarted discussion on PEP 517 meant I realised that we hadn't
officially decided between its Python API based approach and PEP 516's
approach of using the backend CLI as the standardised interface (akin
to the current setup.py approach).

My current intention is to reject PEP 516's CLI standardisation
approach on the following grounds:

- PEP 517 makes a convincing case for the benefits of the Python API
based approach within the Python ecosystem
- the difficulties encountered in evolving the setup.py CLI over time
lend significant weight to the notion that a Python level API will be
easier to update without breaking backwards compatibility
- PEP 517 still advises front-ends to isolate back-end invocation
behind a subprocess boundary due to all of the other practical
benefits that brings, it just makes the specifics of that invocation
an implementation detail of the front-end tool
- third party tools that want an implementation language independent
CLI abstraction over the Python ecosystem (including builds) can just
use pip itself (or another standards-compliant frontend)

This isn't particularly urgent, but I also don't see a lot of reason
for an extended discussion, so if I don't hear a convincing
counterargument in the meantime, I'll mark the PEP as rejected at the
beginning of June :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan
On 22 May 2017 at 23:15, Paul Moore  wrote:
> On 22 May 2017 at 12:28, Thomas Kluyver  wrote:
>> What if it wants to send a character which can't be encoded in the
>> locale encoding? It's quite easy on Windows to end up with a character
>> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
>> 'replace'), then you've lost information even before it gets to the
>> install tool.
>>
>> It's 2017, I really don't want to go down the 'locale specified
>> encoding' route again. UTF-8 everywhere!
>
> Hang on. Can we take a step back here? I just re-read the PEP and
> remembered (!) that hooks are *in-process* Python entry points (I've
> been working with pip's current backend-as-subprocess model, and mixed
> up in my mind the original 2 proposals here). I think this encoding
> debate may be a red herring.

No, that's discussed here:
https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals

Even though PEP 517 defines a Python API for build backends to
implement, it still expects installation tools to wrap a subprocess
call around the backend invocation.

Frontends needs to do that in order to protect *their own* process
state from bugs and design quirks in backend implementations:

- no monkeypatching of parent process modules
- no changes to the standard stream configuration
- no persistent locale changes
- no environment variable changes
- no manipulation of any other process global state
- calling sys.exit() won't cryptically crash the entire installation process
- memory leaks won't cryptically crash the entire installation process
- infinite loops won't *necessarily* crash the entire installation
process (if the build has a timeout on it)
- installation tools running with elevated privileges can readily run
the build process with reduced privileges
- installation tools can also readily run the build process in a
chroot or containerised environment

And in the context of this thread, it gives the frontend complete
control over the stream output from not only the backend itself, but
any child processes that it launches.

That said, the whole "The build backend still runs in a subprocess"
aspect should probably be separated out into its own section
"Isolating build backends from frontend process state", rather than
solely being covered in the "Comparison to PEP 516?" section, as it's
a key aspect of the design - we expect each installation tool to
provide its own CLI shim for calling build backends, rather than
requiring all installation tools to use the same one.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore
On 22 May 2017 at 12:28, Thomas Kluyver  wrote:
> What if it wants to send a character which can't be encoded in the
> locale encoding? It's quite easy on Windows to end up with a character
> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
> 'replace'), then you've lost information even before it gets to the
> install tool.
>
> It's 2017, I really don't want to go down the 'locale specified
> encoding' route again. UTF-8 everywhere!

Hang on. Can we take a step back here? I just re-read the PEP and
remembered (!) that hooks are *in-process* Python entry points (I've
been working with pip's current backend-as-subprocess model, and mixed
up in my mind the original 2 proposals here). I think this encoding
debate may be a red herring.

If a hook is being called as a Python method call, then it can print
what it likes to stdout and stderr. And it's the backend's
responsibility to ensure that it never fails when printing - so the
*backend* has to deal with the fact that anything it wants to print
must be representable in sys.stdout.encoding, with the default (raise
an exception) error handling. Given this fact, and the fact that
sys.stdout and sys.stderr are *text* output streams, build frontends
like pip can reasonably just replace sys.std{out,err} (for example
with a StringIO object) to get hook output. There's no encoding issue
for frontends, they just capture the text sent to the stdio streams.

The rules needed for *backends* are then:

1. Backends MUST NOT write to raw IO channels, all output MUST go via
sys.stdout and sys.stderr. Build frontends MAY redirect these streams
to post-process them, but are not required to do so. As a consequence:

  1a. Backends MUST be prepared to deal with the possibility that
those IO streams have the limitations of the platform IO streams
(e.g., limited subset of Unicode allowed, fails with an exception when
invalid characters are written).

  1b. Backends MUST capture and manage the output from any
subprocesses they spawn (so that they can follow the other rules).

  1c. Backends cannot assume that they can write output that the user
will see - frontends may suppress or modify any output passed on
stdout. Conversely, backends should not bypass the ability of
frontends to capture stdout, as frontends are responsible for user
interaction.

Some of those MUSTs could be replaced by SHOULD, if we want to allow
backends to write directly to the screen. But that is likely to
corrupt the UI of the frontend, so I'm inclined to say that we don't
allow that.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan
On 22 May 2017 at 21:28, Thomas Kluyver  wrote:
> On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote:
>> The only reservation I have is that the choice of UTF-8 means that on
>> Windows, build backends pretty much have to explicitly manage tool
>> output (as they are pretty much certain *not* to output in UTF-8).
>> Build backend writers that aren't aware of this issue (most likely
>> because their main platform is not Windows) could very easily choose
>> to just pass through the raw bytes, and as a result *all* non-ASCII
>> output would be garbled on non-UTF-8 systems.
>>
>> Would locale.getpreferredencoding() not be a better choice here? I
>> know it has issues in some situations on Unix, but are they worse than
>> the issues UTF-8 would cause on Windows? After all it's the encoding
>> used by subprocess.Popen in "universal newlines" mode...
>
> What if it wants to send a character which can't be encoded in the
> locale encoding? It's quite easy on Windows to end up with a character
> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
> 'replace'), then you've lost information even before it gets to the
> install tool.

The counterargument is that there's plenty of text that *can* be
correctly encoded in cp1252 (especially in Europe and LATAM) that will
be rendered incorrectly if the installation tool attempts to interpret
it as UTF-8. CPython itself will also display explicitly UTF-8 encoded
text incorrectly on a Windows console in versions prior to 3.6.

> It's 2017, I really don't want to go down the 'locale specified
> encoding' route again. UTF-8 everywhere!

"UTF-8 everywhere" is fine for network services that only need to talk
to other network services, command line applications, and web
browsers, but even in 2017 it's still a problematic assumption on
client devices running Windows or Linux.

Rather than the locale specified encoding being broken in general, the
key recurring problem we've found with it on *nix systems relates to
the fact that glibc still defaults to ASCII in the C locale - "assume
ASCII really means UTF-8" is enough to solve that problem *without*
breaking compatibility with cp1252 and non-UTF-8 universal encodings.

The other recurring problem is cp1252 itself on Windows, which suffers
from the fact that there isn't a nice environment variable based way
to change the active code page when invoking a subprocess, and also
that cp65001 (the UTF-8 code page) isn't really properly supported in
Python 2.7 (although you can inject a custom search function to alias
it to utf-8 [1]).

Even in that case though, mandating "though shalt treat the streams as
UTF-8" in the spec doesn't *solve* those problems - it just means
we're specifying a behaviour that we know will provide a poor
developer experience on Windows, rather than alerting tool developers
to the fact that this is something they're going to need to be aware
of.

Cheers,
Nick.

[1] http://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan
On 22 May 2017 at 21:02, Paul Moore  wrote:
> On 22 May 2017 at 11:22, Thomas Kluyver  wrote:
>> I have made a PR against the PEP with my best take on the encoding
>> situation:
>> https://github.com/python/peps/pull/264/files
>
> LGTM.
>
> The only reservation I have is that the choice of UTF-8 means that on
> Windows, build backends pretty much have to explicitly manage tool
> output (as they are pretty much certain *not* to output in UTF-8).
> Build backend writers that aren't aware of this issue (most likely
> because their main platform is not Windows) could very easily choose
> to just pass through the raw bytes, and as a result *all* non-ASCII
> output would be garbled on non-UTF-8 systems.
>
> Would locale.getpreferredencoding() not be a better choice here? I
> know it has issues in some situations on Unix, but are they worse than
> the issues UTF-8 would cause on Windows? After all it's the encoding
> used by subprocess.Popen in "universal newlines" mode...

+1 from me for locale.getpreferredencoding() as the default - not only
is it a more suitable default on Windows, it's also the best way to do
the right thing in GB.18030 locales, and as far as I'm aware, handling
that correctly is still a requirement for selling commercial software
into China (that's why I chose it as the main non-UTF-8 example
encoding in PEP 538).

If Python tools want to specifically detect the use of 7-bit ASCII and
override *that* to be UTF-8, then the relevant snippet is:

def get_stream_encoding():
nominal = locale.getpreferredencoding()
if codecs.lookup(nominal).name == "ascii":
return "utf-8"
return nominal

That's effectively the same model that PEP 538 and 540 are proposing
be applied by default for the standard streams, so it would also
interoperate well with Python 3.7+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver
On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote:
> The only reservation I have is that the choice of UTF-8 means that on
> Windows, build backends pretty much have to explicitly manage tool
> output (as they are pretty much certain *not* to output in UTF-8).
> Build backend writers that aren't aware of this issue (most likely
> because their main platform is not Windows) could very easily choose
> to just pass through the raw bytes, and as a result *all* non-ASCII
> output would be garbled on non-UTF-8 systems.
> 
> Would locale.getpreferredencoding() not be a better choice here? I
> know it has issues in some situations on Unix, but are they worse than
> the issues UTF-8 would cause on Windows? After all it's the encoding
> used by subprocess.Popen in "universal newlines" mode...

What if it wants to send a character which can't be encoded in the
locale encoding? It's quite easy on Windows to end up with a character
that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
'replace'), then you've lost information even before it gets to the
install tool.

It's 2017, I really don't want to go down the 'locale specified
encoding' route again. UTF-8 everywhere!

One affordance I'd consider is a recommendation to install tools that if
captured output is not valid UTF-8, they dump the raw bytes to a file so
that no information is lost. I'm not sure if that recommendation needs
to be in the spec itself, though.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore
On 22 May 2017 at 11:22, Thomas Kluyver  wrote:
> I have made a PR against the PEP with my best take on the encoding
> situation:
> https://github.com/python/peps/pull/264/files

LGTM.

The only reservation I have is that the choice of UTF-8 means that on
Windows, build backends pretty much have to explicitly manage tool
output (as they are pretty much certain *not* to output in UTF-8).
Build backend writers that aren't aware of this issue (most likely
because their main platform is not Windows) could very easily choose
to just pass through the raw bytes, and as a result *all* non-ASCII
output would be garbled on non-UTF-8 systems.

Would locale.getpreferredencoding() not be a better choice here? I
know it has issues in some situations on Unix, but are they worse than
the issues UTF-8 would cause on Windows? After all it's the encoding
used by subprocess.Popen in "universal newlines" mode...

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver
I have made a PR against the PEP with my best take on the encoding
situation:
https://github.com/python/peps/pull/264/files

On Mon, May 22, 2017, at 11:19 AM, Paul Moore wrote:
> On 22 May 2017 at 10:56, Thomas Kluyver  wrote:
> > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
> >> Require that build tools either send UTF-8 to the UI component, or write
> >> bytes to a file and call it a build output. I see no benefit in
> >> requiring both the build tool and the UI tool to guess what the text
> >> encoding is.
> >
> > I'm not proposing that the install tool should try to guess the
> > encoding, but I think a well written install tool shouldn't crash if the
> > build output doesn't match the encoding it expects. Even if the spec
> > says that the build output MUST be UTF-8 encoded, build tools can have
> > bugs, and you don't want want the install to fail just because the log
> > isn't correctly encoded.
> >
> > Hence, I think a 'SHOULD' is appropriate for this part of the spec:
> >
> > - To install tool authors, it is clear that they can display the output
> > as UTF-8 so long as they don't crash if it's invalid.
> > - To build tool authors, it's clear that they can't pass the buck to
> > install tool authors if output gets jumbled because it's not UTF-8.
> 
> I'd say that it's not so much just "well written" install tools. I'd
> say that install tools MUST NOT crash if build tool output isn't in
> the expected encoding. On the other hand, the encoding agreement
> implies that if build tools *do* send data in the correct encoding
> then they are entitled to expect that it will be displayed accurately
> to the end user.
> 
> Output can be garbled in two ways:
> 
> 1. The build tool does not (or cannot) ensure that its output is in
> the standard-mandated encoding.
> 2. The install tool cannot display the full range of characters
> representable in the standard-mandated encoding.
> 
> Neither of these should cause a failure. Well written install tools
> should warn in the case of (1) - "I have been passed data that I don't
> understand, I'll do my best to display it but can't guarantee the
> output won't be garbled". In the case of (2), though, that's "as
> expected" - if your OS settings mean you can't display certain
> characters, you shouldn't be surprised if your install tool replaces
> them with a placeholder.
> 
> On an implementation note, this boils down to something like the
> following in the install tool:
> 
> # Step 1
> try:
> data = decode build output using STD_ENCODING
> except UnicodeDecodeError:
> warn "Data is not in expected encoding"
> data = decode using STD_ENCODING with errors= replacement>
> 
> # Step 2
> data = data.encode(MY_OUTPUT_ENCODING, errors= replacement>).decode(MY_OUTPUT_ENCODING)
> 
> # We now have subprocess output that's safe to display if requested.
> 
> As a side note, I find step 2 "sanitise my string to ensure it can be
> safely output" to be a pretty common operation - possibly because
> Python's standard IO streams raise exceptions on unicode errors - and
> I'm surprised there isn't a better way to spell it than the
> encode/decode pair above.
> 
> Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore
On 22 May 2017 at 10:56, Thomas Kluyver  wrote:
> On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
>> Require that build tools either send UTF-8 to the UI component, or write
>> bytes to a file and call it a build output. I see no benefit in
>> requiring both the build tool and the UI tool to guess what the text
>> encoding is.
>
> I'm not proposing that the install tool should try to guess the
> encoding, but I think a well written install tool shouldn't crash if the
> build output doesn't match the encoding it expects. Even if the spec
> says that the build output MUST be UTF-8 encoded, build tools can have
> bugs, and you don't want want the install to fail just because the log
> isn't correctly encoded.
>
> Hence, I think a 'SHOULD' is appropriate for this part of the spec:
>
> - To install tool authors, it is clear that they can display the output
> as UTF-8 so long as they don't crash if it's invalid.
> - To build tool authors, it's clear that they can't pass the buck to
> install tool authors if output gets jumbled because it's not UTF-8.

I'd say that it's not so much just "well written" install tools. I'd
say that install tools MUST NOT crash if build tool output isn't in
the expected encoding. On the other hand, the encoding agreement
implies that if build tools *do* send data in the correct encoding
then they are entitled to expect that it will be displayed accurately
to the end user.

Output can be garbled in two ways:

1. The build tool does not (or cannot) ensure that its output is in
the standard-mandated encoding.
2. The install tool cannot display the full range of characters
representable in the standard-mandated encoding.

Neither of these should cause a failure. Well written install tools
should warn in the case of (1) - "I have been passed data that I don't
understand, I'll do my best to display it but can't guarantee the
output won't be garbled". In the case of (2), though, that's "as
expected" - if your OS settings mean you can't display certain
characters, you shouldn't be surprised if your install tool replaces
them with a placeholder.

On an implementation note, this boils down to something like the
following in the install tool:

# Step 1
try:
data = decode build output using STD_ENCODING
except UnicodeDecodeError:
warn "Data is not in expected encoding"
data = decode using STD_ENCODING with errors=

# Step 2
data = data.encode(MY_OUTPUT_ENCODING, errors=).decode(MY_OUTPUT_ENCODING)

# We now have subprocess output that's safe to display if requested.

As a side note, I find step 2 "sanitise my string to ensure it can be
safely output" to be a pretty common operation - possibly because
Python's standard IO streams raise exceptions on unicode errors - and
I'm surprised there isn't a better way to spell it than the
encode/decode pair above.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver
On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
> Require that build tools either send UTF-8 to the UI component, or write 
> bytes to a file and call it a build output. I see no benefit in 
> requiring both the build tool and the UI tool to guess what the text 
> encoding is.

I'm not proposing that the install tool should try to guess the
encoding, but I think a well written install tool shouldn't crash if the
build output doesn't match the encoding it expects. Even if the spec
says that the build output MUST be UTF-8 encoded, build tools can have
bugs, and you don't want want the install to fail just because the log
isn't correctly encoded.

Hence, I think a 'SHOULD' is appropriate for this part of the spec:

- To install tool authors, it is clear that they can display the output
as UTF-8 so long as they don't crash if it's invalid.
- To build tool authors, it's clear that they can't pass the buck to
install tool authors if output gets jumbled because it's not UTF-8.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig