subject:"\"Re\\\: \\\[Distutils\\\] PEP 517 \\\- specifying build system in pyproject.toml\""

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-28 Thread Thomas Kluyver

On Thu, May 25, 2017, at 03:38 PM, Nick Coghlan wrote:
> Seeing it like this pushes me from "Eh, maybe?" to "No, definitely not"
> [on the log directory] :)

That's fine by me. It does feel like unwanted extra complexity for both
backends and frontends. And backends dealing with output in an unknown
encoding can still choose to write it to a file and log "full output in
/tmp/blah" if they want - they don't need a spec for that.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Steve Dower


On 25May2017 0756, Paul Moore wrote:

On 25 May 2017 at 15:38, Nick Coghlan  wrote:

So I'm inclined to accept the encoding amendment, and then
provisionally accept the overall PEP pending implementation in pip.


Me too. (Assuming I understand Steve's comments on backends, and he's
comfortable with the idea that backends need to capture and manage
MSVC output for presentation to the frontend).


Sounds like you understood my comments :) +1 overall (-0 on a formal way 
to pass logs via the disk)


As I mentioned at one point, there's a bug against the CPython test 
suite that the distutils tests show too much console output, which is 
because distutils currently just lets MSVC write directly to the 
console. To fix it, we need to capture the output and then conditionally 
display it, at which point transcoding from ANSI to UTF-8 with 'replace' 
is trivial, and saves the front end (in this case, the test suite) from 
having to guess. So it is something that the backend around MSVC needs 
to do regardless, and if the PEP says "send me UTF-8" then it's one less 
thing for the backend developer to guess.


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Paul Moore

On 25 May 2017 at 15:38, Nick Coghlan  wrote:
> Seeing it like this pushes me from "Eh, maybe?" to "No, definitely not" :)

Agreed. Given that it's stated as optional for frontends to support
it, I'd be arguing against pip bothering (as it seems like too much
complexity) - so I'd rather leave it out until another frontend comes
along. If at that point there's a need, we can always revise the PEP.

> So I'm inclined to accept the encoding amendment, and then
> provisionally accept the overall PEP pending implementation in pip.

Me too. (Assuming I understand Steve's comments on backends, and he's
comfortable with the idea that backends need to capture and manage
MSVC output for presentation to the frontend).

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Nick Coghlan

On 26 May 2017 at 00:04, Thomas Kluyver  wrote:
> On Thu, May 25, 2017, at 02:27 PM, Paul Moore wrote:
>> I'd be concerned here that we risk making the frontend UI a lot more
>> complex for little actual benefit. I'd rather we stick with the
>> current model, where a backend just has some output to pass through to
>> the frontend. Let's get a solution that works for that before adding
>> extra complexity, or we'll never get the PEP signed off.
>
> I'm inclined to agree that we're overcomplicating things. But if we
> can't agree on which simple-but-imperfect option to take, maybe it's
> worth trying to work out something more complex.
>
> My proposed addition to the PEP so far says this:
>
> The build frontend may capture stdout and/or stderr from the backend. If
> the backend detects that an output stream is not a terminal/console
> (e.g. ``not sys.stdout.isatty()``), it SHOULD ensure that any output it
> writes to that stream is UTF-8 encoded. The build frontend MUST NOT fail
> if captured output is not valid UTF-8, but it MAY not preserve all the
> information in that case (e.g. it may decode using the *replace* error
> handler in Python). If the output stream is a terminal, the build
> backend is responsible for presenting its output accurately, as for any
> program running in a terminal.
>
> We could add a paragraph like this:
>
> The backend may do some operations, such as running subprocesses, which
> produce output in an unknown encoding. To handle such output, the build
> frontend MAY (?) create an empty directory, and set the environment
> variable PEP517_BUILD_LOGS to the path of this directory for the
> backend. If this environment variable is set, the backend MAY create any
> number of files inside this directory containing additional output. This
> is designed to allow the use of encoding detection tools on this output.
> If files are created in this directory, frontends SHOULD display its
> location in their output, and MAY display the contents of the files.

Seeing it like this pushes me from "Eh, maybe?" to "No, definitely not" :)

So that gets us to the point where we're agreeing that your suggested
addition to the PEP is basically right, with the only remaining
question being whether or not we're happy with the section that says
"it SHOULD ensure that any output it writes to that stream is UTF-8
encoded".

For a Python with locale coercion enabled, we're going to get that by
default, so such environments will comply without backend developers
doing anything in particular. Frontends may also decide to implement
their own PEP 538 style locale coercion for the backend build process
when they're running in a non-UTF-8 locale - specifying UTF-8 as a
SHOULD in the PEP gives them implied permission to do that.

So I don't think this is going to place any undue burden on backend
developers for *nix systems - frontends will probably want to
implement PEP 538 style locale coercion for LC_CTYPE to handle cases
where tools rely on the default stream encoding, but I think that's
fine.

That leaves Windows, and there I'm prepared to defer to Steve Dower's
opinion that it's better to deal with the encoding challenges of
consuming the output from MSVC in the build backend, rather than
expecting the frontend to deal with it. We also have a precedent now
in pip's legacy subprocess handling for what doing that reliably looks
like, so it shouldn't be hard for backend implementors to re-use that
approach as needed.

So I'm inclined to accept the encoding amendment, and then
provisionally accept the overall PEP pending implementation in pip.

I'll give others a couple of days to comment further, but assuming
nothing else comes up, I'll go ahead and do that on the weekend :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Thomas Kluyver

On Thu, May 25, 2017, at 02:27 PM, Paul Moore wrote:
> I'd be concerned here that we risk making the frontend UI a lot more
> complex for little actual benefit. I'd rather we stick with the
> current model, where a backend just has some output to pass through to
> the frontend. Let's get a solution that works for that before adding
> extra complexity, or we'll never get the PEP signed off.

I'm inclined to agree that we're overcomplicating things. But if we
can't agree on which simple-but-imperfect option to take, maybe it's
worth trying to work out something more complex.

My proposed addition to the PEP so far says this:

The build frontend may capture stdout and/or stderr from the backend. If
the backend detects that an output stream is not a terminal/console
(e.g. ``not sys.stdout.isatty()``), it SHOULD ensure that any output it
writes to that stream is UTF-8 encoded. The build frontend MUST NOT fail
if captured output is not valid UTF-8, but it MAY not preserve all the
information in that case (e.g. it may decode using the *replace* error
handler in Python). If the output stream is a terminal, the build
backend is responsible for presenting its output accurately, as for any
program running in a terminal.

We could add a paragraph like this:

The backend may do some operations, such as running subprocesses, which
produce output in an unknown encoding. To handle such output, the build
frontend MAY (?) create an empty directory, and set the environment
variable PEP517_BUILD_LOGS to the path of this directory for the
backend. If this environment variable is set, the backend MAY create any
number of files inside this directory containing additional output. This
is designed to allow the use of encoding detection tools on this output.
If files are created in this directory, frontends SHOULD display its
location in their output, and MAY display the contents of the files.

That's not a massive amount more complexity for the spec, but it does
add a moderate burden to frontend & backend implementations which want
to properly support it.

If you're being purist about it, displaying a path on a Unix based
system is producing output in an unknown encoding, since filenames in
Unix are bytes. I don't imagine many tools are going to go that far,
though.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Paul Moore

On 25 May 2017 at 13:26, Nick Coghlan  wrote:
> On 24 May 2017 at 20:29, Thomas Kluyver  wrote:
>> Nick:
>>> That's actually pretty similar to the way tools like mock (the chroot
>>> based RPM builder) work. That way, build backends could choose
>>> between:
>>>
>>> - use pipes to stream output from the tools they call, deal with
>>> encoding issues themselves
>>> - redirect output to a suitable named file in the tool log directory
>>
>> Do you know if that system works well for mock? Shall I try to draft a
>> spec of something like this for PEP 517?
>
> I'm genuinely unsure. The main downside of the directory based
> approach is that it doesn't play well with CI systems in general -
> those are typically set up to capture the standard streams, and if you
> want to capture other artifacts, you either have to stream them
> anyway, or else you have to use a CI specific upload mechanism to keep
> them around.
>
> I guess what we could do is have a "debug log directory" as part of
> the defined interface between the frontends and the build backends,
> and then the exact UX of dealing with those build logs would then be
> something for frontends to define (e.g. offering an option to
> automatically stream the logs after a failed build, with appropriate
> headers and footers around each file)

To me, this feels like a lot of potentially unnecessary complexity. At
the moment pip's UI works around "run a build, get some output,
display the output if the situation warrants (i.e., there was an
error)". The only stumbling block is over transferring that output
from backend to frontend where we need to consider text/bytes issues.

We're now talking about potentially managing a directory containing
logs, do we need to persist log files, should we display the file
content or just the filename, etc.

I'd be concerned here that we risk making the frontend UI a lot more
complex for little actual benefit. I'd rather we stick with the
current model, where a backend just has some output to pass through to
the frontend. Let's get a solution that works for that before adding
extra complexity, or we'll never get the PEP signed off.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Wayne Werner

FWIW, I was just reading an article about writing libraries to just operate
on streams and totally ignore stdout/stdin/file io, and just leave the IO
to something else.

It may be a good idea to define the spec as purely operating on byte and
text streams, then leave where those streams go as an implementation
detail. That way for CI systems they could dump to stdout/stdin and other
systems  could do something different.

-W

On Thu, May 25, 2017, 7:27 AM Nick Coghlan  wrote:

> On 24 May 2017 at 20:29, Thomas Kluyver  wrote:
> > Nick:
> >> That's actually pretty similar to the way tools like mock (the chroot
> >> based RPM builder) work. That way, build backends could choose
> >> between:
> >>
> >> - use pipes to stream output from the tools they call, deal with
> >> encoding issues themselves
> >> - redirect output to a suitable named file in the tool log directory
> >
> > Do you know if that system works well for mock? Shall I try to draft a
> > spec of something like this for PEP 517?
>
> I'm genuinely unsure. The main downside of the directory based
> approach is that it doesn't play well with CI systems in general -
> those are typically set up to capture the standard streams, and if you
> want to capture other artifacts, you either have to stream them
> anyway, or else you have to use a CI specific upload mechanism to keep
> them around.
>
> I guess what we could do is have a "debug log directory" as part of
> the defined interface between the frontends and the build backends,
> and then the exact UX of dealing with those build logs would then be
> something for frontends to define (e.g. offering an option to
> automatically stream the logs after a failed build, with appropriate
> headers and footers around each file)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-25 Thread Nick Coghlan

On 24 May 2017 at 20:29, Thomas Kluyver  wrote:
> Nick:
>> That's actually pretty similar to the way tools like mock (the chroot
>> based RPM builder) work. That way, build backends could choose
>> between:
>>
>> - use pipes to stream output from the tools they call, deal with
>> encoding issues themselves
>> - redirect output to a suitable named file in the tool log directory
>
> Do you know if that system works well for mock? Shall I try to draft a
> spec of something like this for PEP 517?

I'm genuinely unsure. The main downside of the directory based
approach is that it doesn't play well with CI systems in general -
those are typically set up to capture the standard streams, and if you
want to capture other artifacts, you either have to stream them
anyway, or else you have to use a CI specific upload mechanism to keep
them around.

I guess what we could do is have a "debug log directory" as part of
the defined interface between the frontends and the build backends,
and then the exact UX of dealing with those build logs would then be
something for frontends to define (e.g. offering an option to
automatically stream the logs after a failed build, with appropriate
headers and footers around each file)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-24 Thread Thomas Kluyver

On Wed, May 24, 2017, at 01:22 AM, Chris Jerdonek wrote:
> 1) Would it make sense to provide a way for build tools to specify
> what encoding they use (e.g. if not using the default), instead of
> changing their encoding to conform to a standard? It seems like that
> could be easier, although I know this doesn't address problems like
> non-conforming tools.

Interesting idea, but I'm not convinced it actually makes anything
easier. You still have the same issues if the backend runs a subprocess
which doesn't produce output in the expected encoding. And there would
be some small amount of added complexity to communicate the encoding to
the frontend.

> 2) In terms of debugging, in cases where there are encoding-related
> errors, it would help if the overall system made it easy to pinpoint
> which parts of the system are at fault (using good error handling,
> diagnostic messages, etc).

Agreed.

Nick:
> That's actually pretty similar to the way tools like mock (the chroot
> based RPM builder) work. That way, build backends could choose
> between:
> 
> - use pipes to stream output from the tools they call, deal with
> encoding issues themselves
> - redirect output to a suitable named file in the tool log directory

Do you know if that system works well for mock? Shall I try to draft a
spec of something like this for PEP 517?

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-24 Thread Nick Coghlan

On 24 May 2017 at 03:04, Thomas Kluyver  wrote:
> I'll propose a variant of an idea I described already: the frontend
> could provide the backend with a fresh temp directory. If the backend
> needs to run other processes, it can redirect the output into a file in
> that temp directory. Then you have files with an unknown encoding,  but
> each file will hopefully have one encoding, and you can use a tool like
> chardet to guess what it is.

That's actually pretty similar to the way tools like mock (the chroot
based RPM builder) work. That way, build backends could choose
between:

- use pipes to stream output from the tools they call, deal with
encoding issues themselves
- redirect output to a suitable named file in the tool log directory

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Chris Jerdonek

A couple comments:

1) Would it make sense to provide a way for build tools to specify
what encoding they use (e.g. if not using the default), instead of
changing their encoding to conform to a standard? It seems like that
could be easier, although I know this doesn't address problems like
non-conforming tools.

2) In terms of debugging, in cases where there are encoding-related
errors, it would help if the overall system made it easy to pinpoint
which parts of the system are at fault (using good error handling,
diagnostic messages, etc).

--Chris

On Tue, May 23, 2017 at 10:04 AM, Thomas Kluyver  wrote:
> On Tue, May 23, 2017, at 04:20 PM, Nick Coghlan wrote:
>> Up to this point, I've been in favour of both 1b and 2b, since they're
>
> Noted.
>
>> However, I also realised that there's a potential third way to handle
>> this problem: design a Python level API that allows front ends to use
>> more structured data formats (e.g. JSON) for communication between the
>> frontend and their backend shim.
>>
>> In particular, I'm thinking we could move the current
>> "config_settings" dict onto something like a "build context" object
>> that, *even in Python 2*, offers a Unicode "outstream" and
>> "errstream", which the backend is then expected to use rather than
>> writing to sys.stdout/err directly. That context could also provide a
>> Python 3 style "run()" API for subprocess invocation that implemented
>> the preferred stream handling behaviour for subprocess invocation
>> (including applying the "backslashreplace" error handler regardless of
>> version)
>
> I'm not really compelled by this so far:
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Thomas Kluyver

On Tue, May 23, 2017, at 04:20 PM, Nick Coghlan wrote:
> Up to this point, I've been in favour of both 1b and 2b, since they're

Noted.

> However, I also realised that there's a potential third way to handle
> this problem: design a Python level API that allows front ends to use
> more structured data formats (e.g. JSON) for communication between the
> frontend and their backend shim.
> 
> In particular, I'm thinking we could move the current
> "config_settings" dict onto something like a "build context" object
> that, *even in Python 2*, offers a Unicode "outstream" and
> "errstream", which the backend is then expected to use rather than
> writing to sys.stdout/err directly. That context could also provide a
> Python 3 style "run()" API for subprocess invocation that implemented
> the preferred stream handling behaviour for subprocess invocation
> (including applying the "backslashreplace" error handler regardless of
> version)

I'm not really compelled by this so far:

- It's more complexity for build tools - instead of just producing
output as usual, now they have to pass around a context object and
direct output to it.
- What does the frontend do if there is output on stdout/stderr anyway?
Throw it away? Let it go straight to the terminal? Reprimand the backend
for not using the streams in the build context? Or try to include it as
part of build output anyway?
- I don't see how it solves the issue with subprocesses producing
unknown encodings. The output bytes still need to be interpreted
somehow.

I'll propose a variant of an idea I described already: the frontend
could provide the backend with a fresh temp directory. If the backend
needs to run other processes, it can redirect the output into a file in
that temp directory. Then you have files with an unknown encoding,  but
each file will hopefully have one encoding, and you can use a tool like
chardet to guess what it is.

Thomas

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 17:16, Nick Coghlan  wrote:
> Yep, and that's also why I want to avoid trying to use it to improve
> the encoding handling situation - pip and other tools have to deal
> with the current mess regardless, and there's already likely to be
> some significant churn in this space as a result of the changes Victor
> and I have proposed for Python 3.7.

Encoding issues have been around in pip for many years, with little or
no progress. We might be getting a handle on things now (Thomas'
initial email in this thread was very timely - the fact that I was in
the middle of working on the encoding issue in pip was the only reason
I picked up on the need for clarity in the PEP) but I'd be very
cautious about saying we've got it solved until we have the latest
changes in a released version of pip and we get some feedback (or
silence, more likely) from international users.

One of the reasons I made the point about ease of testing earlier in
the thread is that we've found it's extremely difficult to pin down
the root of the reported problems in pip - the route that badly
encoded data takes from build tool to pip's output is pretty
convoluted. Anything that adds clear-cut boundaries at which we can
make guarantees about the integrity of the data will help a lot with
this in the future.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Nick Coghlan

On 24 May 2017 at 01:39, Paul Moore  wrote:
> On 23 May 2017 at 16:20, Nick Coghlan  wrote:
>> Taking that approach of just defining a helper API and expecting build
>> backends to either use it or emulate it gives us some quite attractive
>> properties:
>
> Making the output data part of a structured API (and by implication,
> saying that backends shouldn't be writing to stdout directly at all)
> would definitely improve the situation, IMO. Frankly, it seems likely
> that the only real way we're going to get backend developers to
> consider encodings is by having the "build output" as a string value
> passed back via the API, rather than implied in the fact that backends
> can write to stdout/err. It also squarely places the responsibility
> for dealing with the question of displaying full-range Unicode output
> to the user onto the frontend.
>
> However, it's a relatively big change to the PEP and there's a risk
> that by endlessly reaching for perfection, we miss the chance to get
> the PEP in at all (another lesson we should probably learn from PEP
> 426!)

Yep, and that's also why I want to avoid trying to use it to improve
the encoding handling situation - pip and other tools have to deal
with the current mess regardless, and there's already likely to be
some significant churn in this space as a result of the changes Victor
and I have proposed for Python 3.7.

As a result, I think adding in additional requirements here runs a
significant risk of requiring build backend developers to do
additional work to achieve nominal spec compliance without actually
simplifying anything in practice for frontend developers.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 16:20, Nick Coghlan  wrote:
> Taking that approach of just defining a helper API and expecting build
> backends to either use it or emulate it gives us some quite attractive
> properties:

Making the output data part of a structured API (and by implication,
saying that backends shouldn't be writing to stdout directly at all)
would definitely improve the situation, IMO. Frankly, it seems likely
that the only real way we're going to get backend developers to
consider encodings is by having the "build output" as a string value
passed back via the API, rather than implied in the fact that backends
can write to stdout/err. It also squarely places the responsibility
for dealing with the question of displaying full-range Unicode output
to the user onto the frontend.

However, it's a relatively big change to the PEP and there's a risk
that by endlessly reaching for perfection, we miss the chance to get
the PEP in at all (another lesson we should probably learn from PEP
426!)

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Nick Coghlan

On 23 May 2017 at 22:41, Thomas Kluyver  wrote:
> On Tue, May 23, 2017, at 12:56 PM, Paul Moore wrote:
> Can I take a quick poll of what people following this topic think?
>
> Q1: Default encoding for captured build stdout/stderr
> a. UTF-8 (consistent, can represent any character)
> b. Locale default (convenient if backend runs subprocesses which produce
> output in the locale encoding)
>
> Q2: Handling unknown encodings from subprocesses
> a. Backend should ensure all output is valid in the target encoding
> (Q1), though it may not be accurate.
> b. Unknown output may be passed on as bytes without transcoding, so the
> frontend can e.g. dump it to a file.

Up to this point, I've been in favour of both 1b and 2b, since they're
the main options that allow a build backend to get itself out of the
way entirely and let the front-end deal with the problem rather than
having to figure out encoding issues for themselves. pip's already has
to deal with the "arbitrarily encoded data" problem for the current
setup.py invocation, and whatever solution is adopted there should
suffice for PEP 517 as well.

If PEP 426 taught me anything, it was that if you weren't planning to
write something yourself, and didn't have the budget to pay someone
else to write it for you, your best bet is to adhere as closely to the
status quo as you can while still incorporating the 100% essential
changes that you actually need. (A Zen of Python style aphorism for
that: "The right way and the easy way should be the same way")

To be honest, I still think that's likely to be the right way to go
for PEP 517, and will take some convincing that we're going to be able
to persuade future backend developers that personally couldn't care
less about encoding issues to adopt anything more complex.

However, I also realised that there's a potential third way to handle
this problem: design a Python level API that allows front ends to use
more structured data formats (e.g. JSON) for communication between the
frontend and their backend shim.

In particular, I'm thinking we could move the current
"config_settings" dict onto something like a "build context" object
that, *even in Python 2*, offers a Unicode "outstream" and
"errstream", which the backend is then expected to use rather than
writing to sys.stdout/err directly. That context could also provide a
Python 3 style "run()" API for subprocess invocation that implemented
the preferred stream handling behaviour for subprocess invocation
(including applying the "backslashreplace" error handler regardless of
version)

That way, instead of trying to hit build backend developers with a
fairly flimsy stick ("Thou shalt comply with the specification or some
other open source developers may say mildly disapproving things about
you on the internet"), we'd instead be offering them the easy way out
of letting the front-end provided build context deal with all the
messy encoding issues.

Taking that approach of just defining a helper API and expecting build
backends to either use it or emulate it gives us some quite attractive
properties:

- backends themselves deal entirely in Unicode, not bytes
- frontends get full control of the communication format used between
the frontend and its backend shim - they're not restricted to plain
text
- the Python 2/3 differences can be handled in the frontend CLI shims,
rather than every backend needing to do it
- we don't need to enshrine any particular encoding handling behaviour
in the spec, we can let it be a quality of implementation issue for
the front-end tools
- platform specific tools can make platform specific choices
- tools can adapt to new platforms without requiring a specification update
- tools can update their default behaviour as other considerations
change (e.g. the possible introduction of locale coercion and
PYTHONUTF8 mode in 3.7)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 13:41, Thomas Kluyver  wrote:
> Can I take a quick poll of what people following this topic think?
>
> Q1: Default encoding for captured build stdout/stderr
> a. UTF-8 (consistent, can represent any character)
> b. Locale default (convenient if backend runs subprocesses which produce
> output in the locale encoding)
>
> Q2: Handling unknown encodings from subprocesses
> a. Backend should ensure all output is valid in the target encoding
> (Q1), though it may not be accurate.
> b. Unknown output may be passed on as bytes without transcoding, so the
> frontend can e.g. dump it to a file.
>
> I'm currently 1:a, 2:?a .

You probably know this, but I'm 1: b, 2: mild preference for a, but
not too bothered. If the answer to 1 is a, though, I strongly prefer
2: a.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Thomas Kluyver

On Tue, May 23, 2017, at 12:56 PM, Paul Moore wrote:
> So based on your proposal, won't you introduce similar bugs by using
> print() without sorting out encodings? Unless (see below) you assume
> that the frontend sorts it out for you.

If you strictly follow the locale encoding, you need to sort it out in
Python anyway, in case the stdout encoding has been overridden by
PYTHONIOENCODING, or PYTHONSTARTUP, or the infernal .pth files. I accept
that those are corner cases, though.

> Yes, subprocesses that produce a known encoding are trivial to deal
> with. But remembering that you *need* to deal with them less so. My
> concern here is the same one as you quote above - assuming that
> subprocess returns UTF-8 encoded bytes, because it does on Linux and
> Mac.

I agree, that is a concern.

> But if you genuinely don't know (or worse, know there is no consistent
> encoding) I'm not sure I see how passing unknown bytes onto the
> frontend, which by necessity has less context to guess what those
> bytes might mean, is the right answer. The frontend is better able to
> know what it wants to *do* with those bytes, but "convert them to text
> for the user to see" is the only real answer here IMO (sure, dumping
> the raw bytes to a file might be an option, but I imagine it'll be a
> relatively uncommon choice).

I was indeed thinking of dumping them to a file. It's not very user
friendly, but it means the information is there if you need it. I
suspect that regardless of the locale, technical information like code
and filesystem paths will often contain enough ASCII that a human can
interpret them even if non-ASCII characters are wrongly encoded. So I
hope that needing to reverse-engineer the encoding will be relatively
rare.

The appeal of this is that it follows "in the face of ambiguity, refuse
the temptation to guess". If the backend guesses the encoding
incorrectly, the frontend gets valid UTF-8, but is no better able to
display it meaningfully, and you then need to go through
decode-encode-decode to recover the original text, even if no data was
lost.

Another option: if the backend runs a subprocess with unknown output
encoding, it redirects that output to a temp file and prints the path in
its own output. Then there's a better chance that the unknown encoding
is at least consistent within the file, so tools can do encoding
detection on it.

> At the end of the day, there is no perfect answer here. Someone is
> going to have to make a judgement call, and as the PEP author, I guess
> that's you. So at this point I'll stop badgering you and leave it up
> to you to decide what the consensus is. Thanks for listening to my
> points, though.

I know what I think, but I don't feel like there's a consensus as yet.

Can I take a quick poll of what people following this topic think?

Q1: Default encoding for captured build stdout/stderr
a. UTF-8 (consistent, can represent any character)
b. Locale default (convenient if backend runs subprocesses which produce
output in the locale encoding)

Q2: Handling unknown encodings from subprocesses
a. Backend should ensure all output is valid in the target encoding
(Q1), though it may not be accurate.
b. Unknown output may be passed on as bytes without transcoding, so the
frontend can e.g. dump it to a file.

I'm currently 1:a, 2:?a .

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 12:36, Thomas Kluyver  wrote:
> As you described earlier, though, even using a locale dependent encoding
> doesn't really avoid this problem, because of tools using OEM vs ANSI
> codepages on Windows. And if PYTHONIOENCODING is set, Python processes
> will use that over the locale encoding. I think we're ultimately better
> off specifying a consistent encoding rather than trying to guess about
> it.

Agreed it doesn't avoid the problem. But it does minimise it. I don't
see any huge advantage in having a consistent encoding across
platforms though - having a consistent *rule*, yes, but "use the
locale encoding" is such a rule as well.

> I'm also thinking of all the bugs I've seen (and written) by assuming
> open() in text mode defaults to UTF-8 encoding - as it does on the Linux
> and Mac computers many open source developers use, but not on Windows,
> nor in all Linux configurations.

So based on your proposal, won't you introduce similar bugs by using
print() without sorting out encodings? Unless (see below) you assume
that the frontend sorts it out for you.

> So I'd recommend that backends running processes for which they know the
> encoding should transcode it to UTF-8. I expect we can make standard
> utility functions to wait for a subprocess to finish while reading,
> transcoding, and repeating its output.

Yes, subprocesses that produce a known encoding are trivial to deal
with. But remembering that you *need* to deal with them less so. My
concern here is the same one as you quote above - assuming that
subprocess returns UTF-8 encoded bytes, because it does on Linux and
Mac.

> I'm still not sure what the backend should do when it runs something for
> which it doesn't know the output encoding. The possibilities are either:
>
> - Take a best guess and transcode it to UTF-8, which may risk losing
> some information, but keeps the output as valid UTF-8
> - Pass through the raw bytes, ensuring that no information is lost, but
> leaving it up to the frontend/user to deal with that.

There's never a good answer here. The "correct" answer is to do
research and establish what encoding the tool uses, but that's often
stupidly difficult.

But if you genuinely don't know (or worse, know there is no consistent
encoding) I'm not sure I see how passing unknown bytes onto the
frontend, which by necessity has less context to guess what those
bytes might mean, is the right answer. The frontend is better able to
know what it wants to *do* with those bytes, but "convert them to text
for the user to see" is the only real answer here IMO (sure, dumping
the raw bytes to a file might be an option, but I imagine it'll be a
relatively uncommon choice).

At the end of the day, there is no perfect answer here. Someone is
going to have to make a judgement call, and as the PEP author, I guess
that's you. So at this point I'll stop badgering you and leave it up
to you to decide what the consensus is. Thanks for listening to my
points, though.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Thomas Kluyver

On Tue, May 23, 2017, at 11:04 AM, Paul Moore wrote:
> However, if we do this then we have a situation where existing build
> tools (compilers, etc) that we have to support still use platform
> dependent encodings. That's a reality that we can't wish away. And the
> majority of real-life issues reported on pip are with compilation
> errors. So do we require backends that run these tools to ensure that
> they transcode the output, or do we risk significant output
> corruption, because (essentially) every high-bit character in the
> compiler output will be replaced as it's invalid UTF-8?

As you described earlier, though, even using a locale dependent encoding
doesn't really avoid this problem, because of tools using OEM vs ANSI
codepages on Windows. And if PYTHONIOENCODING is set, Python processes
will use that over the locale encoding. I think we're ultimately better
off specifying a consistent encoding rather than trying to guess about
it.

I'm also thinking of all the bugs I've seen (and written) by assuming
open() in text mode defaults to UTF-8 encoding - as it does on the Linux
and Mac computers many open source developers use, but not on Windows,
nor in all Linux configurations.

So I'd recommend that backends running processes for which they know the
encoding should transcode it to UTF-8. I expect we can make standard
utility functions to wait for a subprocess to finish while reading,
transcoding, and repeating its output.

I'm still not sure what the backend should do when it runs something for
which it doesn't know the output encoding. The possibilities are either:

- Take a best guess and transcode it to UTF-8, which may risk losing
some information, but keeps the output as valid UTF-8
- Pass through the raw bytes, ensuring that no information is lost, but
leaving it up to the frontend/user to deal with that.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 09:56, Thomas Kluyver  wrote:
> I may have missed it, but has anyone proposed what it should do if it
> wants to send characters which can't be encoded in the locale encoding?

No, it's not been mentioned - the focus has been on running build
tools like a compiler. Best answer I can give is to use a
(backslash)replace error handler. I agree this is suboptimal, but see
below.

> Paths on Windows are handled natively as UTF-16, as I understand it, so
> it's entirely possible for them to contain characters which can't be
> represented in, say, CP1252.

Agreed. In practice, the vast bulk of the issues reported for pip seem
to be to do with filename characters or localised messages using the
ANSI/OEM codepages, though. But I agree that in theory this is an
issue.

> Given this, and the workarounds Nick has pointed out are necessary for
> systems where the locale thinks it's ASCII, I still think that
> specifying "UTF-8" is a better option than trying to work with locale
> encodings. We're building a new spec for new tools in 2017, let's not
> prolong the pain of platform-dependent default encodings further.

However, if we do this then we have a situation where existing build
tools (compilers, etc) that we have to support still use platform
dependent encodings. That's a reality that we can't wish away. And the
majority of real-life issues reported on pip are with compilation
errors. So do we require backends that run these tools to ensure that
they transcode the output, or do we risk significant output
corruption, because (essentially) every high-bit character in the
compiler output will be replaced as it's invalid UTF-8?

I agree 100% that UTF-8 is in theory the right thing. My focus is on
the practical aspects of minimising the risks of repeating the sorts
of actual issues that we have seen in the past on pip, though, and
"don't require backends that run compilers to transcode the output"
seems to me to be the most likely route to achieve that.

Having said that, I won't be the one writing those backends - if
people like Steve are OK with transcoding (or dealing with pip issues
saying "I can't read the compiler output" being passed back to them as
backend issues) then I'm not going to argue against UTF-8.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Thomas Kluyver

On Tue, May 23, 2017, at 09:08 AM, Paul Moore wrote:
> I strongly
> prefer using the locale encoding as the assumed encoding for the
> output stream rather than UTF-8.

I may have missed it, but has anyone proposed what it should do if it
wants to send characters which can't be encoded in the locale encoding?
Paths on Windows are handled natively as UTF-16, as I understand it, so
it's entirely possible for them to contain characters which can't be
represented in, say, CP1252.

Given this, and the workarounds Nick has pointed out are necessary for
systems where the locale thinks it's ASCII, I still think that
specifying "UTF-8" is a better option than trying to work with locale
encodings. We're building a new spec for new tools in 2017, let's not
prolong the pain of platform-dependent default encodings further.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-23 Thread Paul Moore

On 23 May 2017 at 05:11, Nick Coghlan  wrote:
> What we can then also do is to recommend that *front-ends* do the
> following when invoking their build backend CLI shims:
>
> 1. Implement the C locale -> UTF-8 based locale coercion defined in
> PEP 538 when launching the subprocess
> 2. Implement a similar coercion for Windows, where cp1252 being active
> in the parent process prompts a call to "'chcp cp65001'" inside the
> subprocess before the build backend itself actually starts running

I'm a fairly strong -1 on doing "chcp 65001" on Windows. It puts the
backend into a position of running under a relatively non-standard
environment, and therefore runs the risk of provoking issues. If a
build tool has issues as a result of the changed codepage, who's
responsible for dealing with the bug? The backend, that manages the
tool, or the frontend, that set the codepage? One of the big issues we
have is that very few people have expertise in this area (encodings on
Windows) and so keeping the environment as "standard" as we can
ensures that we can make best use of the limited expertise we have.

I agree with Thomas that we're probably reaching a point of
diminishing returns. But on one point I remain in dispute - I strongly
prefer using the locale encoding as the assumed encoding for the
output stream rather than UTF-8. Also (although this is a quality of
implementation issue) I think that the frontend (i.e.the shim) should
*not* make any changes to the global environment that the backend runs
in.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan

On 23 May 2017 at 03:38, Steve Dower  wrote:
> Okay, I think I get the problem now. We expect backends to let child
> subprocesses just spit out whatever *they* want onto the same stdout/stderr.
>
> I'm really not a fan of forcing front ends to clean up that mess, and so I'd
> still suggest that the backend "tool" be a script to launch the actual tool
> and do the conversion to UTF-8.

One of the key premises of PEP 517 is that there will be relatively
few front ends (pip, possibly easy_install, ???), but a relatively
large number of backends (one per build system - at least
distutils/setuptools, distutils2, flit, encons, likely eventually
meson, waf, and yotta, and potentially even C/C++ build systems like
autotools, CMake, etc).

So it makes sense to put the implementation burden for important
aspects of the UX on the part that PyPA has the most influence over
(the front-end), rather than considering it reasonable for front-end
developers to point fingers and say "That UX failure in the tool we
provide isn't *our* fault, it's the fault of the build backend
developers for not complying with the interoperability specification
properly").

Once we make that core assumption about where the responsibility for
the end user experience resides, then the absolutely *minimum*
behavioural requirements that can be placed on build backends are:

- respect the locale encoding
- emit informational messages on stdout
- emit error messages on stderr

What we can then also do is to recommend that *front-ends* do the
following when invoking their build backend CLI shims:

1. Implement the C locale -> UTF-8 based locale coercion defined in
PEP 538 when launching the subprocess
2. Implement a similar coercion for Windows, where cp1252 being active
in the parent process prompts a call to "'chcp cp65001'" inside the
subprocess before the build backend itself actually starts running

That leaves build backend authors with the freedom to assume that they
*don't* need to worry about stream encoding issues, since giving them
access to properly configured streams is the front end's
responsibility.

> Perhaps the middle ground is to specify encoding='utf-8', errors='anything
> but strict' for front-ends, and well-behaved backends should do the work to
> transcode when it is known to be necessary for the tools they run. (i.e.
> frontends do not crash, backends have a simple rule for avoiding loss of
> data).

In PEP 517's architecture, the front-end developers are also
responsible for the CLI that's running inside the backend subprocess.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver

On Mon, May 22, 2017, at 11:36 PM, Steve Dower wrote:
> IMHO, #2 is definitely the right way to go. Yes, the platform specific 
> code now has to worry about the encoding, but... the encoding is 
> platform specific? So... that seems exactly right? :) Maybe I'm still 
> missing something here, but I'm totally happy to leave it to Thomas to 
> decide (which I think he has, but I haven't gotten to looking at that PR 
> yet).

I think I broadly agree with this as well. My reservation is that the
build backend might be running a subprocess which produces output in an
*unknown* encoding, especially if it allows the package author or the
end user to configure a command to run. If it doesn't know the encoding,
I'd rather get the raw bytes from the subprocess in the log (e.g. dumped
to a file), rather than attempting to transcode them to UTF-8 - the
conversion risks losing information, and even if it doesn't, it makes it
harder to work out what was really meant.

I feel like we're spending a lot of energy on a point that's not really
central to the PEP, though. I think we've established that there's a
potential for bugs and mojibake whatever we put in the spec. So I'd like
to put something relatively simple and move on. I still stand by my PR,
which amounts to "backends try to make it UTF-8, frontends don't crash
if it isn't". I might be persuaded to add a recommendation that
frontends dump the bytes to a file if they're not UTF-8, so the user can
pull it apart if necessary.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Steve Dower


On 22May2017 1253, Paul Moore wrote:

It seems to me there are 2 schools of thought:

1. There are likely to be fewer front ends than back ends, and so the
front end(s) (basically, pip) should deal with the problem. Also,
backends are more likely to be written by developers who are looking
at very specific scenarios, and asking them to handle all the
complexities of robust multilingual coding is raising the bar on
writing a backend too high.

2. The backend is where the problem lies, and so the backend should
address the issue. Furthermore, a well-established principle in
dealing with encodings is to convert to strings right at the boundary
of the application, and in this case the backend is the only code that
has access to that boundary.

(I tend towards (2), but I honestly can't say to what extent that's
because it makes it "someone else's problem" for me ;-))


I also tend towards 2, and I assume I am one of the more likely people 
to write the part that invokes Microsoft's cl.exe/link.exe :)


Is the front end going to be directly invoking those tools? I would 
assume not, otherwise it won't be cross platform.


Since the shim belongs to the front end, I've essentially been ignoring 
it. The shim can invoke another part of the build tool, but that is not 
going to be cl.exe/link.exe either.


At some point there will be a script that runs the tools directly. I 
have been referring to that as the backend, and it is the part that 
should handle capturing and transcoding the output. Everything from 
there can be utf8:replace to prevent crashing, but we can't say "the 
frontend can handle all encodings", and shouldn't say "the frontend will 
only use bad encodings".


IMHO, #2 is definitely the right way to go. Yes, the platform specific 
code now has to worry about the encoding, but... the encoding is 
platform specific? So... that seems exactly right? :) Maybe I'm still 
missing something here, but I'm totally happy to leave it to Thomas to 
decide (which I think he has, but I haven't gotten to looking at that PR 
yet).


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore

On 22 May 2017 at 18:38, Steve Dower  wrote:
> Okay, I think I get the problem now. We expect backends to let child
> subprocesses just spit out whatever *they* want onto the same stdout/stderr.

s/expect/allow/

The paranoid in me suspects "expect" is also true, though :-)

> I'm really not a fan of forcing front ends to clean up that mess, and so I'd
> still suggest that the backend "tool" be a script to launch the actual tool
> and do the conversion to UTF-8.

What you're referring to as the backend "tool" being a script, is what
the PEP refers to as a "shim" (as Nick pointed out to me) and is
considered part of the front end. The back end is a set of Python APIs
which are called by the front end (in any real life front end, via the
front end's shim script).

> Perhaps the middle ground is to specify encoding='utf-8', errors='anything
> but strict' for front-ends, and well-behaved backends should do the work to
> transcode when it is known to be necessary for the tools they run. (i.e.
> frontends do not crash, backends have a simple rule for avoiding loss of
> data).

For front ends, "never crash" is essential. But "produce as readable
as possible data" is also a high priority. Consider for example a
Russian user with a series of directories named in Russian. If the
tools write an error using his local 8-bit encoding, and the front end
assumes UTF-8, then all of the high-bit characters in his directory
names would be replaced. Deciphering an error message like "File
???/?/?.c: unexpected EOF" is problematic... :-(

The model assumes that most front-ends would call the backend via a
subprocess "shim" that was maintained by the front end project. But
the expectation here seems to be that the backend is allowed to write
directly to the stdio streams of its process (or at least, to let the
tools it calls do so). So the shim *cannot* control the encoding of
the data received by the frontend, and so the encoding has to be
agreed between backend and frontend. The basic question is how the
responsibility for dealing with data in an uncertain encoding is
allocated.

It seems to me there are 2 schools of thought:

1. There are likely to be fewer front ends than back ends, and so the
front end(s) (basically, pip) should deal with the problem. Also,
backends are more likely to be written by developers who are looking
at very specific scenarios, and asking them to handle all the
complexities of robust multilingual coding is raising the bar on
writing a backend too high.

2. The backend is where the problem lies, and so the backend should
address the issue. Furthermore, a well-established principle in
dealing with encodings is to convert to strings right at the boundary
of the application, and in this case the backend is the only code that
has access to that boundary.

(I tend towards (2), but I honestly can't say to what extent that's
because it makes it "someone else's problem" for me ;-))

As you say, the middle ground here is that front ends must never
crash, and back ends should (but aren't required to) produce output in
a specified encoding (I still prefer the locale encoding as that has
the best chance of avoiding the / issue). That's more or less
what pip has to deal with now (and not that far off (1)), and my
current attempt to address that situation is at
https://github.com/pypa/pip/pull/4486 for what it's worth.

A couple of final thoughts. I would expect that testing the handling
of encodings is likely to be an important issue (at least, I expect
there'll be bugs, and adding tests to make sure they get properly
fixed will be important). Handling tool output encoding in the backend
is likely to involve relatively low level interface functions, where
the inputs and outputs can be relatively easily mocked. So I would
expect backend unit testing of encoding handling would be relatively
straightforward. Conversely, testing front end handling of encoding
issues is very tricky - it's necessary to set up system state to
persuade the build tools to produce the data you want to test against
(it feels like integration testing rather than unit testing). Also,
fixing encoding issues in the backend decouples the fix from pip's
release cycle, which is probably a good thing (unless the backend is
not well maintained, but that's an issue in itself).

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Steve Dower


On 22May2017 0803, Paul Moore wrote:

On 22 May 2017 at 15:23, Nick Coghlan  wrote:

No, that's discussed here:
https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals

Even though PEP 517 defines a Python API for build backends to
implement, it still expects installation tools to wrap a subprocess
call around the backend invocation.


OK, but is it not acceptable for the child cmdline process (owned by
pip) to capture the backend implementation's stdout using reassignment
of sys.stdout? I assume, from your response, that it's *not*
acceptable to do that - but that needs to be documented somewhere.
Specifically, that the child cmdline is not allowed to do something
like:

out = io.StringIO
sys.stdout = out
build_backend.hook()
print(out.getvalue(), encoding="UTF-8")

(Which would otherwise be a very simple way to get guaranteed UTF-8 as
the encoding across the process boundary - but it does so by imposing
basically the rules I stated on the backend).


Okay, I think I get the problem now. We expect backends to let child 
subprocesses just spit out whatever *they* want onto the same stdout/stderr.


I'm really not a fan of forcing front ends to clean up that mess, and so 
I'd still suggest that the backend "tool" be a script to launch the 
actual tool and do the conversion to UTF-8.


Perhaps the middle ground is to specify encoding='utf-8', 
errors='anything but strict' for front-ends, and well-behaved backends 
should do the work to transcode when it is known to be necessary for the 
tools they run. (i.e. frontends do not crash, backends have a simple 
rule for avoiding loss of data).


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore

On 22 May 2017 at 15:23, Nick Coghlan  wrote:
> No, that's discussed here:
> https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals
>
> Even though PEP 517 defines a Python API for build backends to
> implement, it still expects installation tools to wrap a subprocess
> call around the backend invocation.

OK, but is it not acceptable for the child cmdline process (owned by
pip) to capture the backend implementation's stdout using reassignment
of sys.stdout? I assume, from your response, that it's *not*
acceptable to do that - but that needs to be documented somewhere.
Specifically, that the child cmdline is not allowed to do something
like:

out = io.StringIO
sys.stdout = out
build_backend.hook()
print(out.getvalue(), encoding="UTF-8")

(Which would otherwise be a very simple way to get guaranteed UTF-8 as
the encoding across the process boundary - but it does so by imposing
basically the rules I stated on the backend).

> That said, the whole "The build backend still runs in a subprocess"
> aspect should probably be separated out into its own section
> "Isolating build backends from frontend process state", rather than
> solely being covered in the "Comparison to PEP 516?" section, as it's
> a key aspect of the design - we expect each installation tool to
> provide its own CLI shim for calling build backends, rather than
> requiring all installation tools to use the same one.

Strong +1. And that section needs to be very clear on issues like
this, covering what the shim is allowed to do. As the point of the
shim is to protect the backend from frontend state, I'm OK with the
general principle that the shim must do "as little as possible" before
calling the hook - but "reset sys.stdout to protect against encoding
errors" could easily be seen as within the realm of acceptable
behaviour (as it stops hooks writing arbitrary Unicode to a standard
output that the shim knows is limited).

I'm happy enough with the idea that pip won't do anything silly in its
CLI shim, but we don't want to get into the "implementation as the
standard" situation where a backend is allowed to do anything that
pip's shim can cope with...

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan

On 22 May 2017 at 23:15, Paul Moore  wrote:
> On 22 May 2017 at 12:28, Thomas Kluyver  wrote:
>> What if it wants to send a character which can't be encoded in the
>> locale encoding? It's quite easy on Windows to end up with a character
>> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
>> 'replace'), then you've lost information even before it gets to the
>> install tool.
>>
>> It's 2017, I really don't want to go down the 'locale specified
>> encoding' route again. UTF-8 everywhere!
>
> Hang on. Can we take a step back here? I just re-read the PEP and
> remembered (!) that hooks are *in-process* Python entry points (I've
> been working with pip's current backend-as-subprocess model, and mixed
> up in my mind the original 2 proposals here). I think this encoding
> debate may be a red herring.

No, that's discussed here:
https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals

Even though PEP 517 defines a Python API for build backends to
implement, it still expects installation tools to wrap a subprocess
call around the backend invocation.

Frontends needs to do that in order to protect *their own* process
state from bugs and design quirks in backend implementations:

- no monkeypatching of parent process modules
- no changes to the standard stream configuration
- no persistent locale changes
- no environment variable changes
- no manipulation of any other process global state
- calling sys.exit() won't cryptically crash the entire installation process
- memory leaks won't cryptically crash the entire installation process
- infinite loops won't *necessarily* crash the entire installation
process (if the build has a timeout on it)
- installation tools running with elevated privileges can readily run
the build process with reduced privileges
- installation tools can also readily run the build process in a
chroot or containerised environment

And in the context of this thread, it gives the frontend complete
control over the stream output from not only the backend itself, but
any child processes that it launches.

That said, the whole "The build backend still runs in a subprocess"
aspect should probably be separated out into its own section
"Isolating build backends from frontend process state", rather than
solely being covered in the "Comparison to PEP 516?" section, as it's
a key aspect of the design - we expect each installation tool to
provide its own CLI shim for calling build backends, rather than
requiring all installation tools to use the same one.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore

On 22 May 2017 at 12:28, Thomas Kluyver  wrote:
> What if it wants to send a character which can't be encoded in the
> locale encoding? It's quite easy on Windows to end up with a character
> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
> 'replace'), then you've lost information even before it gets to the
> install tool.
>
> It's 2017, I really don't want to go down the 'locale specified
> encoding' route again. UTF-8 everywhere!

Hang on. Can we take a step back here? I just re-read the PEP and
remembered (!) that hooks are *in-process* Python entry points (I've
been working with pip's current backend-as-subprocess model, and mixed
up in my mind the original 2 proposals here). I think this encoding
debate may be a red herring.

If a hook is being called as a Python method call, then it can print
what it likes to stdout and stderr. And it's the backend's
responsibility to ensure that it never fails when printing - so the
*backend* has to deal with the fact that anything it wants to print
must be representable in sys.stdout.encoding, with the default (raise
an exception) error handling. Given this fact, and the fact that
sys.stdout and sys.stderr are *text* output streams, build frontends
like pip can reasonably just replace sys.std{out,err} (for example
with a StringIO object) to get hook output. There's no encoding issue
for frontends, they just capture the text sent to the stdio streams.

The rules needed for *backends* are then:

1. Backends MUST NOT write to raw IO channels, all output MUST go via
sys.stdout and sys.stderr. Build frontends MAY redirect these streams
to post-process them, but are not required to do so. As a consequence:

  1a. Backends MUST be prepared to deal with the possibility that
those IO streams have the limitations of the platform IO streams
(e.g., limited subset of Unicode allowed, fails with an exception when
invalid characters are written).

  1b. Backends MUST capture and manage the output from any
subprocesses they spawn (so that they can follow the other rules).

  1c. Backends cannot assume that they can write output that the user
will see - frontends may suppress or modify any output passed on
stdout. Conversely, backends should not bypass the ability of
frontends to capture stdout, as frontends are responsible for user
interaction.

Some of those MUSTs could be replaced by SHOULD, if we want to allow
backends to write directly to the screen. But that is likely to
corrupt the UI of the frontend, so I'm inclined to say that we don't
allow that.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan

On 22 May 2017 at 21:28, Thomas Kluyver  wrote:
> On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote:
>> The only reservation I have is that the choice of UTF-8 means that on
>> Windows, build backends pretty much have to explicitly manage tool
>> output (as they are pretty much certain *not* to output in UTF-8).
>> Build backend writers that aren't aware of this issue (most likely
>> because their main platform is not Windows) could very easily choose
>> to just pass through the raw bytes, and as a result *all* non-ASCII
>> output would be garbled on non-UTF-8 systems.
>>
>> Would locale.getpreferredencoding() not be a better choice here? I
>> know it has issues in some situations on Unix, but are they worse than
>> the issues UTF-8 would cause on Windows? After all it's the encoding
>> used by subprocess.Popen in "universal newlines" mode...
>
> What if it wants to send a character which can't be encoded in the
> locale encoding? It's quite easy on Windows to end up with a character
> that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
> 'replace'), then you've lost information even before it gets to the
> install tool.

The counterargument is that there's plenty of text that *can* be
correctly encoded in cp1252 (especially in Europe and LATAM) that will
be rendered incorrectly if the installation tool attempts to interpret
it as UTF-8. CPython itself will also display explicitly UTF-8 encoded
text incorrectly on a Windows console in versions prior to 3.6.

> It's 2017, I really don't want to go down the 'locale specified
> encoding' route again. UTF-8 everywhere!

"UTF-8 everywhere" is fine for network services that only need to talk
to other network services, command line applications, and web
browsers, but even in 2017 it's still a problematic assumption on
client devices running Windows or Linux.

Rather than the locale specified encoding being broken in general, the
key recurring problem we've found with it on *nix systems relates to
the fact that glibc still defaults to ASCII in the C locale - "assume
ASCII really means UTF-8" is enough to solve that problem *without*
breaking compatibility with cp1252 and non-UTF-8 universal encodings.

The other recurring problem is cp1252 itself on Windows, which suffers
from the fact that there isn't a nice environment variable based way
to change the active code page when invoking a subprocess, and also
that cp65001 (the UTF-8 code page) isn't really properly supported in
Python 2.7 (although you can inject a custom search function to alias
it to utf-8 [1]).

Even in that case though, mandating "though shalt treat the streams as
UTF-8" in the spec doesn't *solve* those problems - it just means
we're specifying a behaviour that we know will provide a poor
developer experience on Windows, rather than alerting tool developers
to the fact that this is something they're going to need to be aware
of.

Cheers,
Nick.

[1] http://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Nick Coghlan

On 22 May 2017 at 21:02, Paul Moore  wrote:
> On 22 May 2017 at 11:22, Thomas Kluyver  wrote:
>> I have made a PR against the PEP with my best take on the encoding
>> situation:
>> https://github.com/python/peps/pull/264/files
>
> LGTM.
>
> The only reservation I have is that the choice of UTF-8 means that on
> Windows, build backends pretty much have to explicitly manage tool
> output (as they are pretty much certain *not* to output in UTF-8).
> Build backend writers that aren't aware of this issue (most likely
> because their main platform is not Windows) could very easily choose
> to just pass through the raw bytes, and as a result *all* non-ASCII
> output would be garbled on non-UTF-8 systems.
>
> Would locale.getpreferredencoding() not be a better choice here? I
> know it has issues in some situations on Unix, but are they worse than
> the issues UTF-8 would cause on Windows? After all it's the encoding
> used by subprocess.Popen in "universal newlines" mode...

+1 from me for locale.getpreferredencoding() as the default - not only
is it a more suitable default on Windows, it's also the best way to do
the right thing in GB.18030 locales, and as far as I'm aware, handling
that correctly is still a requirement for selling commercial software
into China (that's why I chose it as the main non-UTF-8 example
encoding in PEP 538).

If Python tools want to specifically detect the use of 7-bit ASCII and
override *that* to be UTF-8, then the relevant snippet is:

def get_stream_encoding():
nominal = locale.getpreferredencoding()
if codecs.lookup(nominal).name == "ascii":
return "utf-8"
return nominal

That's effectively the same model that PEP 538 and 540 are proposing
be applied by default for the standard streams, so it would also
interoperate well with Python 3.7+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver

On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote:
> The only reservation I have is that the choice of UTF-8 means that on
> Windows, build backends pretty much have to explicitly manage tool
> output (as they are pretty much certain *not* to output in UTF-8).
> Build backend writers that aren't aware of this issue (most likely
> because their main platform is not Windows) could very easily choose
> to just pass through the raw bytes, and as a result *all* non-ASCII
> output would be garbled on non-UTF-8 systems.
> 
> Would locale.getpreferredencoding() not be a better choice here? I
> know it has issues in some situations on Unix, but are they worse than
> the issues UTF-8 would cause on Windows? After all it's the encoding
> used by subprocess.Popen in "universal newlines" mode...

What if it wants to send a character which can't be encoded in the
locale encoding? It's quite easy on Windows to end up with a character
that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
'replace'), then you've lost information even before it gets to the
install tool.

It's 2017, I really don't want to go down the 'locale specified
encoding' route again. UTF-8 everywhere!

One affordance I'd consider is a recommendation to install tools that if
captured output is not valid UTF-8, they dump the raw bytes to a file so
that no information is lost. I'm not sure if that recommendation needs
to be in the spec itself, though.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore

On 22 May 2017 at 11:22, Thomas Kluyver  wrote:
> I have made a PR against the PEP with my best take on the encoding
> situation:
> https://github.com/python/peps/pull/264/files

LGTM.

The only reservation I have is that the choice of UTF-8 means that on
Windows, build backends pretty much have to explicitly manage tool
output (as they are pretty much certain *not* to output in UTF-8).
Build backend writers that aren't aware of this issue (most likely
because their main platform is not Windows) could very easily choose
to just pass through the raw bytes, and as a result *all* non-ASCII
output would be garbled on non-UTF-8 systems.

Would locale.getpreferredencoding() not be a better choice here? I
know it has issues in some situations on Unix, but are they worse than
the issues UTF-8 would cause on Windows? After all it's the encoding
used by subprocess.Popen in "universal newlines" mode...

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver

I have made a PR against the PEP with my best take on the encoding
situation:
https://github.com/python/peps/pull/264/files

On Mon, May 22, 2017, at 11:19 AM, Paul Moore wrote:
> On 22 May 2017 at 10:56, Thomas Kluyver  wrote:
> > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
> >> Require that build tools either send UTF-8 to the UI component, or write
> >> bytes to a file and call it a build output. I see no benefit in
> >> requiring both the build tool and the UI tool to guess what the text
> >> encoding is.
> >
> > I'm not proposing that the install tool should try to guess the
> > encoding, but I think a well written install tool shouldn't crash if the
> > build output doesn't match the encoding it expects. Even if the spec
> > says that the build output MUST be UTF-8 encoded, build tools can have
> > bugs, and you don't want want the install to fail just because the log
> > isn't correctly encoded.
> >
> > Hence, I think a 'SHOULD' is appropriate for this part of the spec:
> >
> > - To install tool authors, it is clear that they can display the output
> > as UTF-8 so long as they don't crash if it's invalid.
> > - To build tool authors, it's clear that they can't pass the buck to
> > install tool authors if output gets jumbled because it's not UTF-8.
> 
> I'd say that it's not so much just "well written" install tools. I'd
> say that install tools MUST NOT crash if build tool output isn't in
> the expected encoding. On the other hand, the encoding agreement
> implies that if build tools *do* send data in the correct encoding
> then they are entitled to expect that it will be displayed accurately
> to the end user.
> 
> Output can be garbled in two ways:
> 
> 1. The build tool does not (or cannot) ensure that its output is in
> the standard-mandated encoding.
> 2. The install tool cannot display the full range of characters
> representable in the standard-mandated encoding.
> 
> Neither of these should cause a failure. Well written install tools
> should warn in the case of (1) - "I have been passed data that I don't
> understand, I'll do my best to display it but can't guarantee the
> output won't be garbled". In the case of (2), though, that's "as
> expected" - if your OS settings mean you can't display certain
> characters, you shouldn't be surprised if your install tool replaces
> them with a placeholder.
> 
> On an implementation note, this boils down to something like the
> following in the install tool:
> 
> # Step 1
> try:
> data = decode build output using STD_ENCODING
> except UnicodeDecodeError:
> warn "Data is not in expected encoding"
> data = decode using STD_ENCODING with errors= replacement>
> 
> # Step 2
> data = data.encode(MY_OUTPUT_ENCODING, errors= replacement>).decode(MY_OUTPUT_ENCODING)
> 
> # We now have subprocess output that's safe to display if requested.
> 
> As a side note, I find step 2 "sanitise my string to ensure it can be
> safely output" to be a pretty common operation - possibly because
> Python's standard IO streams raise exceptions on unicode errors - and
> I'm surprised there isn't a better way to spell it than the
> encode/decode pair above.
> 
> Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Paul Moore

On 22 May 2017 at 10:56, Thomas Kluyver  wrote:
> On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
>> Require that build tools either send UTF-8 to the UI component, or write
>> bytes to a file and call it a build output. I see no benefit in
>> requiring both the build tool and the UI tool to guess what the text
>> encoding is.
>
> I'm not proposing that the install tool should try to guess the
> encoding, but I think a well written install tool shouldn't crash if the
> build output doesn't match the encoding it expects. Even if the spec
> says that the build output MUST be UTF-8 encoded, build tools can have
> bugs, and you don't want want the install to fail just because the log
> isn't correctly encoded.
>
> Hence, I think a 'SHOULD' is appropriate for this part of the spec:
>
> - To install tool authors, it is clear that they can display the output
> as UTF-8 so long as they don't crash if it's invalid.
> - To build tool authors, it's clear that they can't pass the buck to
> install tool authors if output gets jumbled because it's not UTF-8.

I'd say that it's not so much just "well written" install tools. I'd
say that install tools MUST NOT crash if build tool output isn't in
the expected encoding. On the other hand, the encoding agreement
implies that if build tools *do* send data in the correct encoding
then they are entitled to expect that it will be displayed accurately
to the end user.

Output can be garbled in two ways:

1. The build tool does not (or cannot) ensure that its output is in
the standard-mandated encoding.
2. The install tool cannot display the full range of characters
representable in the standard-mandated encoding.

Neither of these should cause a failure. Well written install tools
should warn in the case of (1) - "I have been passed data that I don't
understand, I'll do my best to display it but can't guarantee the
output won't be garbled". In the case of (2), though, that's "as
expected" - if your OS settings mean you can't display certain
characters, you shouldn't be surprised if your install tool replaces
them with a placeholder.

On an implementation note, this boils down to something like the
following in the install tool:

# Step 1
try:
data = decode build output using STD_ENCODING
except UnicodeDecodeError:
warn "Data is not in expected encoding"
data = decode using STD_ENCODING with errors=

# Step 2
data = data.encode(MY_OUTPUT_ENCODING, errors=).decode(MY_OUTPUT_ENCODING)

# We now have subprocess output that's safe to display if requested.

As a side note, I find step 2 "sanitise my string to ensure it can be
safely output" to be a pretty common operation - possibly because
Python's standard IO streams raise exceptions on unicode errors - and
I'm surprised there isn't a better way to spell it than the
encode/decode pair above.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-22 Thread Thomas Kluyver

On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
> Require that build tools either send UTF-8 to the UI component, or write 
> bytes to a file and call it a build output. I see no benefit in 
> requiring both the build tool and the UI tool to guess what the text 
> encoding is.

I'm not proposing that the install tool should try to guess the
encoding, but I think a well written install tool shouldn't crash if the
build output doesn't match the encoding it expects. Even if the spec
says that the build output MUST be UTF-8 encoded, build tools can have
bugs, and you don't want want the install to fail just because the log
isn't correctly encoded.

Hence, I think a 'SHOULD' is appropriate for this part of the spec:

- To install tool authors, it is clear that they can display the output
as UTF-8 so long as they don't crash if it's invalid.
- To build tool authors, it's clear that they can't pass the buck to
install tool authors if output gets jumbled because it's not UTF-8.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan

On 21 May 2017 at 02:36, Steve Dower  wrote:
> On 20May2017 0820, Nick Coghlan wrote:
>>
>> Good point regarding the fact that the Windows 16-bit APIs only come
>> into play for interactive sessions (even in 3.6+), while for PEP 517
>> we're specifically interested in the 8-bit pipes used to communicate
>> with build subprocesses launched by an installation tool.
>
>
> I need to catch up on the PEP (and thanks Brett for alerting me to the
> thread), but this comment in particular cements the mental diagram I have
> right now:
>
> (build UI) <--> (build tool) <--> (compiler)
> ( Python ) <--> (  Python  ) <--> (anything)
>
> I'll probably read the PEP closely and see that this is entirely incorrect,
> but if it's right:
>
> * encoding for text between the build UI and build tool should just be
> specified once for all platforms (i.e. use UTF-8).
> * encoding for text between build tool and the compiler depends on the
> compiler

Alas, it isn't quite that simple. Let's take the current de facto standard case:

(user console/CI build log) <-> pip <-> setup.py
(distutils/setuptools) <-> 3rd party tool

Key usability feature:

* when requested, informational messages from 3rd party tools SHOULD
be made available to the end user for debugging purposes

Ideal outcome:

* everything that makes it to the user console or CI build log is
readable by the end user

Essential requirement:

* encoding problems in informational messages emitted by 3rd party
tools MUST NOT cause the build to fail

Now, the easiest way to handle the essential requirement as the author
of an installation or build tool is to choose not to deal with it:
instead, you just treat the output from further downstream as opaque
binary data, and let the user console/CI build log layer deal with any
encoding problems as they see fit. You may end up with some build
failures that are a pain to debug because you're getting nonsense from
the build pipeline, but you won't fail your build *because* some
particular build tool emitted improperly encoded nonsense.

That all changes if we *require* UTF-8 on the link between the
installation tool (e.g. pip) and the build tool (e.g. setup.py). If we
do that:

* the installation tool can't just pass along build tool output to the
user console or CI build log any more, it has a nominal obligation to
try to interpret it as UTF-8
* the build tool (or build tool shim) can't just pass along 3rd party
tool output to the installation tool any more, it has a nominal
obligation to try to get it to emit UTF-8

Now, *particular* installation and build tools may want to strongly
encourage the use of UTF-8 in an effort to get closer to the ideal
outcome, but that isn't the key objective of PEP 517: the key
objective of PEP 517 is to make it easier to use *general purpose*
build systems that happen to be implemented in Python (like waf,
scons, and meson) to handle complex build scenarios, while also
allowing the use of simpler Python-only build systems (like flit) for
distribution of pure Python projects.

That said, the PEP *could* explicitly define a short list of
behaviours that we consider reasonable in an installation tool:

1. Treat the informational output from the build tool as an opaque binary stream
2. Treat the informational output from the build tool as a text stream
encoded using locale.getpreferredencoding(), and decode it using the
backslashreplace error handler
3. Treat the informational output from the build tool as a UTF-8
encoded text stream, and decode it using the backslashreplace error
handler

We'd just need to caveat the latter two options with the fact that
they'll give you a cryptic error message on Python 3.4 and earlier
(including Python 2):

>>> b"\xf0\x01\x02\x03".decode("utf-8", "backslashreplace")
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ncoghlan/devel/py27/Lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: don't know how to handle UnicodeDecodeError in error callback

I had to look that up on Stack Overflow myself, but what it's trying
to say is that until Python 3.5, "backslashreplace" only worked for
encoding, not for decoding.

That means that for earlier versions, you'd need to define your own
custom error handler as described in
http://stackoverflow.com/questions/25442954/how-should-i-decode-bytes-using-ascii-without-losing-any-junk-bytes-if-xmlch/25443356#25443356

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Donald Stufft


> On May 20, 2017, at 4:05 PM, Paul Moore  wrote:
> 
> I'm a little concerned if we're going to end up with a proposal that
> means that distutils is in violation of the spec unless this issue is
> fixed. I'm not sure if that's where we're headed, but I wanted to be
> clear here - is PEP 517 intended to encompass distutils/setuptools, or
> are we treating them as a legacy case, that pip should special-case?


I don’t think distutils/setuptools are going to be compatible out of the box 
anyways, because it’s API is tied to setup.py. Whatever adapter is written to 
adapt it to PEP 517 can handle any semantic differences as well.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower


On 20May2017 1315, Paul Moore wrote:

On 20 May 2017 at 17:36, Steve Dower  wrote:

In general, since most subprocesses (at least on Windows) do not have
customizable encodings, the tool that launches them needs to know what the
encoding is. Since we don't live in a Python 3.6 world quite yet, that means
the tool should read raw bytes from the compiler and encode them to UTF-8.


Did you spot my point that Visual C produces output that's a mixture
of OEM and ANSI codepages?


[SNIP]

Yes, and it's a perfect example of why the MSVC-specific wrapper should 
be the one to deal with tool encodings. If you forward unencoded bytes 
like this back to the UI, it will have to deal with the mixed encoding.



I'd be very surprised if build tool developers got this sort of edge
case correct without at least some guidance from the PEP on the sorts
of things they need to consider. You suggest "read raw bytes and
encode them to UTF-8" - but you don't encode bytes, you encode
strings, so you still need to convert those bytes to a string first,
and there's no encoding you can reliably use for this. You need to use
"errors=replace" to ensure you can handle inconsistently encoded data
without getting an exception.


The "read raw bytes and [transcode] them" comment was meant to be that 
sort of help. I didn't go as far as writing 
`output.decode(output_encoding, errors="replace").encode("utf-8", 
errors="replace")`, but that's basically what I meant to imply. The 
build tool developer is the *only* developer who can get this right, and 
if they can't, then they have to figure out the most appropriate way to 
work around the fact that they can't.


As for defining distutils as incompatible with the PEP, I'm okay with 
that. Updating distutils to use subprocess for launching tools rather 
than spawnv would be a very good start (and likely a good contribution 
for a new contributor), but allowing build tools to continue to be 
written badly is not worthwhile.


Cheers,
Steve

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore

On 20 May 2017 at 17:36, Steve Dower  wrote:
> In general, since most subprocesses (at least on Windows) do not have
> customizable encodings, the tool that launches them needs to know what the
> encoding is. Since we don't live in a Python 3.6 world quite yet, that means
> the tool should read raw bytes from the compiler and encode them to UTF-8.

Did you spot my point that Visual C produces output that's a mixture
of OEM and ANSI codepages?

The example I used was:

OEM code page 850, ANSI codepage 1252 (standard British English Windows)

Visual Studio 2015

cl a£b >output.file

The output uses CP850 (in the cl error message) and CP1252 (in the
link error) for the £ sign.

When run from the command line without redirection, the output is in a
consistent encoding. It's only when you redirect the output (I
redirected to a file, I assume a pipe would be the same) that you get
the problem.

I'd be very surprised if build tool developers got this sort of edge
case correct without at least some guidance from the PEP on the sorts
of things they need to consider. You suggest "read raw bytes and
encode them to UTF-8" - but you don't encode bytes, you encode
strings, so you still need to convert those bytes to a string first,
and there's no encoding you can reliably use for this. You need to use
"errors=replace" to ensure you can handle inconsistently encoded data
without getting an exception.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore

On 20 May 2017 at 19:36, Steve Dower  wrote:
>
>> - As a lazy developer, I don't want to read stdout/stderr from a
>> subprocess only to spit it back to my own stdout/stderr. I'd much rather
>> just launch the subprocess and let it use the same stdout/stderr as my
>> build tool.
>
>
> One of the open issues against distutils is that it does this. We can allow
> it, but a well-defined tool should capture the output and pass it to the UI
> component rather than bypassing the UI component.

I'm a little concerned if we're going to end up with a proposal that
means that distutils is in violation of the spec unless this issue is
fixed. I'm not sure if that's where we're headed, but I wanted to be
clear here - is PEP 517 intended to encompass distutils/setuptools, or
are we treating them as a legacy case, that pip should special-case?

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower


On 20May2017 1011, Thomas Kluyver wrote:

On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote:

In general, since most subprocesses (at least on Windows) do not have
customizable encodings, the tool that launches them needs to know what
the encoding is. Since we don't live in a Python 3.6 world quite yet,
that means the tool should read raw bytes from the compiler and encode
them to UTF-8.


I half agree, but:
- Build tools may not 100% know what encoding output will be produced,
especially if the developer can supply a custom command for the build
tool to run.


In this case, the whole thing breaks down anyway. UI can't be expected 
to reliably display text from an unknown encoding - at some point it has 
to be forced into a known quantity, and IMHO the code closest to the 
tool should do it.



- It's possible for data on a pipe to be binary data with no meaning as
text.


Sure, but it cannot be rendered unless you choose an encoding. All you 
can do is dump it to a file (and let a file editor choose an encoding).



- As a lazy developer, I don't want to read stdout/stderr from a
subprocess only to spit it back to my own stdout/stderr. I'd much rather
just launch the subprocess and let it use the same stdout/stderr as my
build tool.


One of the open issues against distutils is that it does this. We can 
allow it, but a well-defined tool should capture the output and pass it 
to the UI component rather than bypassing the UI component.



So I think it's most practical to recommend that build tools produce
UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide
how far they go to comply with that.


Require that build tools either send UTF-8 to the UI component, or write 
bytes to a file and call it a build output. I see no benefit in 
requiring both the build tool and the UI tool to guess what the text 
encoding is.


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Thomas Kluyver

On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote:
> I'll probably read the PEP closely and see that this is entirely 
> incorrect, but if it's right:
> 
> * encoding for text between the build UI and build tool should just be 
> specified once for all platforms (i.e. use UTF-8).

+1

> * encoding for text between build tool and the compiler depends on the 
> compiler
> 
> In general, since most subprocesses (at least on Windows) do not have 
> customizable encodings, the tool that launches them needs to know what 
> the encoding is. Since we don't live in a Python 3.6 world quite yet, 
> that means the tool should read raw bytes from the compiler and encode 
> them to UTF-8.

I half agree, but:
- Build tools may not 100% know what encoding output will be produced,
especially if the developer can supply a custom command for the build
tool to run.
- It's possible for data on a pipe to be binary data with no meaning as
text.
- As a lazy developer, I don't want to read stdout/stderr from a
subprocess only to spit it back to my own stdout/stderr. I'd much rather
just launch the subprocess and let it use the same stdout/stderr as my
build tool.

So I think it's most practical to recommend that build tools produce
UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide
how far they go to comply with that.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower


On 20May2017 0820, Nick Coghlan wrote:

Good point regarding the fact that the Windows 16-bit APIs only come
into play for interactive sessions (even in 3.6+), while for PEP 517
we're specifically interested in the 8-bit pipes used to communicate
with build subprocesses launched by an installation tool.


I need to catch up on the PEP (and thanks Brett for alerting me to the 
thread), but this comment in particular cements the mental diagram I 
have right now:


(build UI) <--> (build tool) <--> (compiler)
( Python ) <--> (  Python  ) <--> (anything)

I'll probably read the PEP closely and see that this is entirely 
incorrect, but if it's right:


* encoding for text between the build UI and build tool should just be 
specified once for all platforms (i.e. use UTF-8).
* encoding for text between build tool and the compiler depends on the 
compiler


In general, since most subprocesses (at least on Windows) do not have 
customizable encodings, the tool that launches them needs to know what 
the encoding is. Since we don't live in a Python 3.6 world quite yet, 
that means the tool should read raw bytes from the compiler and encode 
them to UTF-8.


The encoding between the tool and UI is essentially irrelevant - the UI 
is going to transform the data anyway for display, and the tool is going 
to have to transform it from the compilation tools, so the best we can 
do is pick the most likely encoding to avoid too many operations. UTF-8 
is probably that.


That's my 0.02AUD based on a vague memory of the PEP and this thread. As 
I get time today (at PyCon) to read up on it I may post amendments, but 
in general I'm +100 on "just pick an encoding and make the 
implementations transcode".


Cheers,
Steve

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Brett Cannon

On Fri, May 19, 2017, 09:20 Thomas Kluyver,  wrote:

> On Fri, May 19, 2017, at 05:17 PM, Paul Moore wrote:
> > On 19 May 2017 at 16:53, Daniel Holth  wrote:
> > > Congrats on getting 518 in.
> >
> > Agreed, by the way. That's a big step!
>
> Thanks both. It does feel like an achievement. :-)
>

As it should! Thanks for bringing the PEP to life!

-brett


___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan

Good point regarding the fact that the Windows 16-bit APIs only come
into play for interactive sessions (even in 3.6+), while for PEP 517
we're specifically interested in the 8-bit pipes used to communicate
with build subprocesses launched by an installation tool.

On 20 May 2017 at 19:11, Paul Moore  wrote:
> The bigger question, though, is to what extent we want to mandate that
> build tools that run external tools such as compilers take
> responsibility for the encoding of the output of those tools (rather
> than simply passing the output through to the output stream
> unmodified). And if we do want to, whether we want to allow an
> exception for setuptools/distutils.
>
> Also, a question regarding Unix - do we really want to mandate UTF-8
> even if the system locale is set to something else? Won't that mean
> that build tools have the same problem with compilers generating
> output in the encoding the tool wants that we already have on Windows?

Yeah, I think that problem was starting to occur to me, hence the
reference to handling RPM and DEB build environments.

At least for non-Windows systems, I see two possible recommendations:

1. We advise installation tools to use binary streams to communicate
with build tools, and treat the results as opaque binary data. If it
needs to be written out to the installation tool's own streams, then
use the binary level APIs for those interfaces to inject the build
tool output directly, rather than decoding and re-encoding it first.

2. We advise installation tools to adopt a PEP 538 style solution,
where they mostly just trust the result of
locale.getpreferredencoding() *unless*
"codecs.lookup(locale.getpreferredencoding()).name == 'ascii'". In the
latter case, we'd advise them to set LC_CTYPE (and potentially LANG)
appropriately for the running OS. Regardless of whether or not that
locale coercion was needed, we'd recommend setting "replace" or
"backslashreplace" when decoding the stream output from the
subprocess.

At the specification level, I think option 1 probably makes the most
sense - we'd be advising insallation tools that they're free to kick
any mojibake problems further down the automation pipeline if they
don't want to worry about it. It's also the only one of the two
recommendations we can readily make cross platform.

At a quality-of-implementation level, there's a lot of potential value
in option 2 (at least on non-Windows systems) - we just wouldn't
require or recommend it at the level of the interoperability
specifications.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore

On 20 May 2017 at 09:03, Thomas Kluyver  wrote:
> On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote:
>> * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X),
>> build systems SHOULD emit UTF-8 encoded output
>> * on platforms with 16-bit standard streams (e.g. Windows), build
>> systems SHOULD emit UTF-16-LE encoded output
>
> I'm quite prepared to accept that I'm mistaken, but my understanding is
> that *standard streams* are 8-bit on Windows as well. The 16-bit thing
> that Python 3.6 does, as I understand it, is to bypass standard streams
> when it detects that they're connected to a console, and use a Windows
> API call to write text to the console directly as UTF-16.
>
> If so, when stdout/stderr are pipes, which I assume is how pip captures
> the output from build processes, there's no particular reason to send
> UTF-16 data just because it's Windows.

That's my understanding too. The standard streams are still byte
streams with an encoding. It's just that the underlying IO when the
final destination is the console, is done by the Windows Unicode APIs.
Because of this, when the output is the console the stream can accept
any unicode character and so an encoding of UTF8 is specified (and
yes, AIUI there is a translation Unicode string -> UTF-8 bytes ->
Unicode console API). For non-console output, though, the standard
streams are still byte streams and the platform behaviour is
respected, so we use the ANSI codepage (calling this the platform
standard glosses over the fact that there are two standard codepages,
ANSI and OEM, and tools don't always make the same choice when faced
with piped output). Long story short, UTF-16 is irrelevant here.

The docs for 3.6 say "Under Windows, if the stream is interactive
(that is, if its isatty() method returns True), the console codepage
is used, otherwise the ANSI code page". This is out of date (it was
true for 3.5 and earlier). In 3.6+ utf-8 is used for interactive
streams rather than the console codepage:

>py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)"
utf-8
>py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)" >$null
cp1252

The bigger question, though, is to what extent we want to mandate that
build tools that run external tools such as compilers take
responsibility for the encoding of the output of those tools (rather
than simply passing the output through to the output stream
unmodified). And if we do want to, whether we want to allow an
exception for setuptools/distutils.

Also, a question regarding Unix - do we really want to mandate UTF-8
even if the system locale is set to something else? Won't that mean
that build tools have the same problem with compilers generating
output in the encoding the tool wants that we already have on Windows?

My feeling is:

1. Build systems SHOULD emit output encoded in the preferred locale
encoding (normally UTF-8 on Unix, ANSI on Windows).
2. Build systems should ideally check the encoding used by external
tools that they run and transcode to the correct encoding if necessary
- but this is a quality of implementation matter.
3. Install tools MUST NOT fail if build tools produce output with the
wrong encoding, but MUST correctly reproduce build tool output if the
build tools do produce the right encoding.

My biggest concern with this is that I believe that Visual C produces
output in the OEM codepage even when output to a pipe. Actually I just
did some experiments (VS 2015), and it's even worse than that. The
compiler (cl) seems to use the OEM code page when writing to a pipe,
but the linker uses the ANSI code page. This means that a command like
"cl a£bc" produces output on (a piped) stdout that contains mixed
encodings. Given this situation, I think we have to simply give up and
take the view that the Visual C tools are simply broken in this
regard, and we shouldn't worry about them. So I'm inclined therefore
to drop point (2) from the 3 above.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Thomas Kluyver

On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote:
> * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X),
> build systems SHOULD emit UTF-8 encoded output
> * on platforms with 16-bit standard streams (e.g. Windows), build
> systems SHOULD emit UTF-16-LE encoded output

I'm quite prepared to accept that I'm mistaken, but my understanding is
that *standard streams* are 8-bit on Windows as well. The 16-bit thing
that Python 3.6 does, as I understand it, is to bypass standard streams
when it detects that they're connected to a console, and use a Windows
API call to write text to the console directly as UTF-16.

If so, when stdout/stderr are pipes, which I assume is how pip captures
the output from build processes, there's no particular reason to send
UTF-16 data just because it's Windows.

Thomas

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan

On 20 May 2017 at 01:16, Thomas Kluyver  wrote:
> On Fri, May 19, 2017, at 03:41 PM, Paul Moore wrote:
>> Can we specify what encoding the informational text must be written
>> in?
>
> Sure, that makes sense. What about:
>
> All hooks are run with working directory set to the root of the source
> tree, and MAY print arbitrary informational text on stdout and stderr.
> This text SHOULD be UTF-8 encoded, but as building may invoke other
> processes, install tools MUST NOT fail if the data they receive is not
> valid UTF-8; though in this case the display of the output may be
> corrupted.
>
> Do we also want to recommend that install tools set
> PYTHONIOENCODING=utf-8 when invoking build tools? Or leave this up to
> the build tools?

Setting PYTHONIOENCODING=utf-8:strict would potentially fail the
"don't fail hard on misencoded output" requirement, and setting
anything else is dubious from a potential data loss or compatibility
point of view (as there's no "surrogateescape" error handler in Python
2).

For use cases like distro package building, we'd also like to inherit
the surrounding build environment, so explictly requiring installation
tools to alter it at the Python level doesn't strike me as ideal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Nick Coghlan

On 20 May 2017 at 00:18, Thomas Kluyver  wrote:
> Hi,
>
> I'd like to make another push for PEP 517, which would make it possible
> to build wheels from a source tree with other build tools, without
> needing setup.py.
>
> https://www.python.org/dev/peps/pep-0517/
>
> Last time this was discussed we made a couple of minor changes to the
> PEP, but we didn't want to accept another packaging related PEP until
> PEP 518 was implemented in pip. I'm pleased to say that that
> implementation has just been merged:
> https://github.com/pypa/pip/pull/4144 .

Huzzah, and congratulations! :)

Regarding the encoding question, I agree with your recommendation with
one key amendment to account for the 16-bit console APIs on Windows:

* on platforms with 8-bit standard streams (e.g. Linux, Mac OS X),
build systems SHOULD emit UTF-8 encoded output
* on platforms with 16-bit standard streams (e.g. Windows), build
systems SHOULD emit UTF-16-LE encoded output
* on platforms that offer both, build systems SHOULD use the 16-bit
streams to match the default behaviour of CPython 3.6+
* install tools MUST NOT fail the build solely due to improperly
encoded output, but are otherwise free to handle the situation as they
see fit

Folks on Python 3.5 and earlier on Windows may still have problems
given that guidance (since that uses the 8-bit stream interfaces with
the Windows native encodings by default), but that's also a large part
of why CPython's behaviour on Windows was changed in 3.6 :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Thomas Kluyver

On Fri, May 19, 2017, at 05:17 PM, Paul Moore wrote:
> On 19 May 2017 at 16:53, Daniel Holth  wrote:
> > Congrats on getting 518 in.
> 
> Agreed, by the way. That's a big step!

Thanks both. It does feel like an achievement. :-)
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Paul Moore

On 19 May 2017 at 16:53, Daniel Holth  wrote:
> Congrats on getting 518 in.

Agreed, by the way. That's a big step!
Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Daniel Holth

Congrats on getting 518 in.

On Fri, May 19, 2017, 11:37 Thomas Kluyver  wrote:

> On Fri, May 19, 2017, at 04:31 PM, Paul Moore wrote:
> > For flit, would having the install tool set PYTHONIOENCODING help?
>
> If install tools were meant to set PYTHONIOENCODING, I probably wouldn't
> do anything else in flit's code. Python should then take care of
> ensuring that any output is UTF-8 encoded, and flit doesn't currently
> invoke any separate processes to do the build.
>
> Thomas
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Thomas Kluyver

On Fri, May 19, 2017, at 04:31 PM, Paul Moore wrote:
> For flit, would having the install tool set PYTHONIOENCODING help?

If install tools were meant to set PYTHONIOENCODING, I probably wouldn't
do anything else in flit's code. Python should then take care of
ensuring that any output is UTF-8 encoded, and flit doesn't currently
invoke any separate processes to do the build.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Paul Moore

On 19 May 2017 at 16:16, Thomas Kluyver  wrote:
> On Fri, May 19, 2017, at 03:41 PM, Paul Moore wrote:
>> Can we specify what encoding the informational text must be written
>> in?
>
> Sure, that makes sense. What about:
>
> All hooks are run with working directory set to the root of the source
> tree, and MAY print arbitrary informational text on stdout and stderr.
> This text SHOULD be UTF-8 encoded, but as building may invoke other
> processes, install tools MUST NOT fail if the data they receive is not
> valid UTF-8; though in this case the display of the output may be
> corrupted.

Looks good, although whether UTF-8 is viable on Windows is something
I'll have to think about.

> Do we also want to recommend that install tools set
> PYTHONIOENCODING=utf-8 when invoking build tools? Or leave this up to
> the build tools?

Good question. At the moment, the only 2 cases I know of are
setuptools/distutils and flit. For setuptools, I'm pretty sure there's
no handling of subprocesses, it just fires them off and lets them
write to the console - so there's nothing to even ensure a consistent
encoding. We may have to allow for special casing with setuptools, as
I doubt anyone's going to put in the effort to add a transcoding layer
in there.

For flit, would having the install tool set PYTHONIOENCODING help?

I don't know immediately what I'd do if I were designing a brand new
build tool that called out to a 3rd party compiler. Let me think about
it.
Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Thomas Kluyver

On Fri, May 19, 2017, at 03:41 PM, Paul Moore wrote:
> Can we specify what encoding the informational text must be written
> in?

Sure, that makes sense. What about:

All hooks are run with working directory set to the root of the source
tree, and MAY print arbitrary informational text on stdout and stderr.
This text SHOULD be UTF-8 encoded, but as building may invoke other
processes, install tools MUST NOT fail if the data they receive is not
valid UTF-8; though in this case the display of the output may be
corrupted.

Do we also want to recommend that install tools set
PYTHONIOENCODING=utf-8 when invoking build tools? Or leave this up to
the build tools?

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-19 Thread Paul Moore

On 19 May 2017 at 15:18, Thomas Kluyver  wrote:
> Hi,
>
> I'd like to make another push for PEP 517, which would make it possible
> to build wheels from a source tree with other build tools, without
> needing setup.py.

A point that came up recently while dealing with a pip issue.

"""
All hooks are run with working directory set to the root of the source
tree, and MAY print arbitrary informational text on stdout and stderr.
"""

Can we specify what encoding the informational text must be written
in? At the moment pip has problems dealing with non-ASCII locales
because it captures the build output and then displays it on error.
This involves a decode/encode step (on Python 3) or printing arbitrary
bytes to stdout (on Python 2). And at the moment we get UnicodeErrors
if there's a mismatch. I've patched it to use errors=replace, but we
still risk mojibake.

Ideally, we should specify an encoding that hooks will use for output
- but that's somewhat difficult as many build tools will want to do
things like run compilers which could do arbitrarily silly things. I
believe this is less of a problem on Unix (where there's a
well-managed convention), but on Windows there's an "OEM" codepage for
console programs, and an "ANSI" codepage for Windows programs - but
not all programs use the same one - some console programs such as
Mingw, I think, and Python itself if stdout is redirected (see
https://docs.python.org/3.6/library/sys.html#sys.stdout) use the ANSI
codepage.

So we may have to fudge the situation a bit. (Maybe something like
"Install tools MAY assume a specific encoding for the output, and MAY
produce corrupted output if the build tool does not use that encoding,
but install tools MUST NOT fail with an encoding error just because
the encodings don't match").

But I don't think we should leave the situation completely unspecified.

Paul.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

59 matches

Mail list logo