Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan
On 21 May 2017 at 02:36, Steve Dower  wrote:
> On 20May2017 0820, Nick Coghlan wrote:
>>
>> Good point regarding the fact that the Windows 16-bit APIs only come
>> into play for interactive sessions (even in 3.6+), while for PEP 517
>> we're specifically interested in the 8-bit pipes used to communicate
>> with build subprocesses launched by an installation tool.
>
>
> I need to catch up on the PEP (and thanks Brett for alerting me to the
> thread), but this comment in particular cements the mental diagram I have
> right now:
>
> (build UI) <--> (build tool) <--> (compiler)
> ( Python ) <--> (  Python  ) <--> (anything)
>
> I'll probably read the PEP closely and see that this is entirely incorrect,
> but if it's right:
>
> * encoding for text between the build UI and build tool should just be
> specified once for all platforms (i.e. use UTF-8).
> * encoding for text between build tool and the compiler depends on the
> compiler

Alas, it isn't quite that simple. Let's take the current de facto standard case:


(user console/CI build log) <-> pip <-> setup.py
(distutils/setuptools) <-> 3rd party tool

Key usability feature:

* when requested, informational messages from 3rd party tools SHOULD
be made available to the end user for debugging purposes

Ideal outcome:

* everything that makes it to the user console or CI build log is
readable by the end user

Essential requirement:

* encoding problems in informational messages emitted by 3rd party
tools MUST NOT cause the build to fail

Now, the easiest way to handle the essential requirement as the author
of an installation or build tool is to choose not to deal with it:
instead, you just treat the output from further downstream as opaque
binary data, and let the user console/CI build log layer deal with any
encoding problems as they see fit. You may end up with some build
failures that are a pain to debug because you're getting nonsense from
the build pipeline, but you won't fail your build *because* some
particular build tool emitted improperly encoded nonsense.

That all changes if we *require* UTF-8 on the link between the
installation tool (e.g. pip) and the build tool (e.g. setup.py). If we
do that:

* the installation tool can't just pass along build tool output to the
user console or CI build log any more, it has a nominal obligation to
try to interpret it as UTF-8
* the build tool (or build tool shim) can't just pass along 3rd party
tool output to the installation tool any more, it has a nominal
obligation to try to get it to emit UTF-8

Now, *particular* installation and build tools may want to strongly
encourage the use of UTF-8 in an effort to get closer to the ideal
outcome, but that isn't the key objective of PEP 517: the key
objective of PEP 517 is to make it easier to use *general purpose*
build systems that happen to be implemented in Python (like waf,
scons, and meson) to handle complex build scenarios, while also
allowing the use of simpler Python-only build systems (like flit) for
distribution of pure Python projects.

That said, the PEP *could* explicitly define a short list of
behaviours that we consider reasonable in an installation tool:

1. Treat the informational output from the build tool as an opaque binary stream
2. Treat the informational output from the build tool as a text stream
encoded using locale.getpreferredencoding(), and decode it using the
backslashreplace error handler
3. Treat the informational output from the build tool as a UTF-8
encoded text stream, and decode it using the backslashreplace error
handler

We'd just need to caveat the latter two options with the fact that
they'll give you a cryptic error message on Python 3.4 and earlier
(including Python 2):

>>> b"\xf0\x01\x02\x03".decode("utf-8", "backslashreplace")
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ncoghlan/devel/py27/Lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: don't know how to handle UnicodeDecodeError in error callback

I had to look that up on Stack Overflow myself, but what it's trying
to say is that until Python 3.5, "backslashreplace" only worked for
encoding, not for decoding.

That means that for earlier versions, you'd need to define your own
custom error handler as described in
http://stackoverflow.com/questions/25442954/how-should-i-decode-bytes-using-ascii-without-losing-any-junk-bytes-if-xmlch/25443356#25443356

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Finally making distlib handle spaces

2017-05-20 Thread Nick Coghlan
On 21 May 2017 at 07:29, Radon Rosborough  wrote:
>> The current behaviour fails, as you note, but it does so in a
>> "standard" way - shebang behaviour (and its limitations) is
>> well-known.
>
> I agree with this, but in my opinion the shebang is simply an
> implementation detail of virtualenv. I would like to quote @JDLH from
> [1]: "There is nothing about the value provided by virtualenv that
> demands it use the shebang." If the shebang were fundamentally
> necessary to provide the functionality of virtualenv, it would make
> sense for virtualenv to inherit the shebang's restrictions. But this
> is not the case, so I think that the shebang should be considered an
> implementation detail that the end user should not need to be aware
> of.

I agree with this way of looking at the problem, so my perspective would be:

1. Don't change anything on Windows (since that already uses the
custom 'py' dispatcher)
2. Don't change anything for cases where platform provided shebang
dispatch is trusted to be correct (i.e. no quoting of the shebang line
is needed)
3. Change the cases that currently quote the shebang line to instead
invoke a custom dispatch script running in /bin/sh

I also agree that distlib is the right level to implement the change -
this isn't about people wanting custom behaviour, it's about distlib's
default dispatch mechanism having been found not to work in certain
cases, so it makes sense to automatically switch to an alternative
that *does* work.

The custom dispatcher doesn't need to solve 100% of the currently
failing cases - it just needs to solve some of them, and provide a
foundation for future iterations on the design and implementation to
handle more.

Cheers,
Nick.

P.S. I was originally going to ask "Can we use Python to implement the
dispatch instead?", but then realised there are actually lots of messy
complications with that around getting program metadata and
environmental details right that sh deals with natively. So /bin/sh is
a better way to go.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Donald Stufft

> On May 20, 2017, at 4:05 PM, Paul Moore  wrote:
> 
> I'm a little concerned if we're going to end up with a proposal that
> means that distutils is in violation of the spec unless this issue is
> fixed. I'm not sure if that's where we're headed, but I wanted to be
> clear here - is PEP 517 intended to encompass distutils/setuptools, or
> are we treating them as a legacy case, that pip should special-case?


I don’t think distutils/setuptools are going to be compatible out of the box 
anyways, because it’s API is tied to setup.py. Whatever adapter is written to 
adapt it to PEP 517 can handle any semantic differences as well.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower

On 20May2017 1315, Paul Moore wrote:

On 20 May 2017 at 17:36, Steve Dower  wrote:

In general, since most subprocesses (at least on Windows) do not have
customizable encodings, the tool that launches them needs to know what the
encoding is. Since we don't live in a Python 3.6 world quite yet, that means
the tool should read raw bytes from the compiler and encode them to UTF-8.


Did you spot my point that Visual C produces output that's a mixture
of OEM and ANSI codepages?


[SNIP]

Yes, and it's a perfect example of why the MSVC-specific wrapper should 
be the one to deal with tool encodings. If you forward unencoded bytes 
like this back to the UI, it will have to deal with the mixed encoding.



I'd be very surprised if build tool developers got this sort of edge
case correct without at least some guidance from the PEP on the sorts
of things they need to consider. You suggest "read raw bytes and
encode them to UTF-8" - but you don't encode bytes, you encode
strings, so you still need to convert those bytes to a string first,
and there's no encoding you can reliably use for this. You need to use
"errors=replace" to ensure you can handle inconsistently encoded data
without getting an exception.


The "read raw bytes and [transcode] them" comment was meant to be that 
sort of help. I didn't go as far as writing 
`output.decode(output_encoding, errors="replace").encode("utf-8", 
errors="replace")`, but that's basically what I meant to imply. The 
build tool developer is the *only* developer who can get this right, and 
if they can't, then they have to figure out the most appropriate way to 
work around the fact that they can't.


As for defining distutils as incompatible with the PEP, I'm okay with 
that. Updating distutils to use subprocess for launching tools rather 
than spawnv would be a very good start (and likely a good contribution 
for a new contributor), but allowing build tools to continue to be 
written badly is not worthwhile.


Cheers,
Steve

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Introducing PyPIContents

2017-05-20 Thread Thomas Kluyver
On Sat, May 20, 2017, at 07:29 PM, Luis Alejandro Martínez Faneyth wrote:> It 
supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2']
> formats, and it extracts the data using any available.
Nice! If there are multiple of those formats present, does it get the
data from just one? Or does it get data from all of them and combine
it somehow?
> 
> I wasn't aware of the fact that some modules may be on one platform
> and not in another. I guess there's room for improvement.
It probably doesn't matter for most cases, but since setup.py runs
arbitrary code, it's possible for it to install different modules in
different situations - or even select modules at random, if you really
want to confuse tools like this. ;-)
This is why my own efforts at indexing focused on wheels - you can be
sure of exactly what a wheel contains. My wheel-indexing tool 'wheeldex'
is here, if there's any code or ideas there that you can 
use:https://github.com/takluyver/wheeldex

> Thank you. I made this because I wanted to have an app that guessed
> python dependencies from code by scaning module imports and then
> looking up the Index. That app is called Pip Sala Bim and you can
> check it out here:> 
> https://github.com/LuisAlejandro/pipsalabim

Neat, that's precisely one of the use cases I was thinking of for an
index. The other thing I'm interested in is providing an interface to
install modules by their import name rather than their PyPI name; I
think your index should work for that as well. I'll dig into the code of
both PyPIContents and Pip Sala Bim more soon.
Thanks,
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Finally making distlib handle spaces

2017-05-20 Thread Radon Rosborough
> The current behaviour fails, as you note, but it does so in a
> "standard" way - shebang behaviour (and its limitations) is
> well-known.

I agree with this, but in my opinion the shebang is simply an
implementation detail of virtualenv. I would like to quote @JDLH from
[1]: "There is nothing about the value provided by virtualenv that
demands it use the shebang." If the shebang were fundamentally
necessary to provide the functionality of virtualenv, it would make
sense for virtualenv to inherit the shebang's restrictions. But this
is not the case, so I think that the shebang should be considered an
implementation detail that the end user should not need to be aware
of.

> At the moment, your proposal is just to use "an alternative" launch
> process. Without a specific proposal, it's impossible to judge
> whether the solution is better than the current situation.

We already have three specific patches which provide alternative
launch processes: [2], [3], and [4]. I feel like that should be
specific enough to start a discussion. In fact, Vinay specifically
requested a discussion about [2] be raised on distutils-sig [5]. The
only reason that no action has been taken is that nobody started that
discussion (until now).

> I would have thought that "#!/usr/bin/env sh" runs the risk of
> picking up a malicious sh executable injected into the user's PATH.

That's certainly a valid concern. Does this happen in the real world?
I feel like if you have a malicious sh executable on your PATH, you're
going to have a lot more problems than just from virtualenv. But this
is a good reason that we may want to restrict the patch to only take
effect when using the shebang directly would fail.

> Also, different systems use different sh implementations - so care
> would need to be taken to only code in the lowest common denominator
> syntax.

Can we assume POSIX compatibility? Even if not, we're not doing
anything complicated, only passing some arguments to a command. Surely
that can be done in pretty much any shell one can find.

> multiple proposals and bikeshedding

Although we have three implementations, my personal preference is for
[4]. This is because:

* It avoids the need for creating new files.
* It only takes effect when necessary (i.e. when the shebang won't
  work).
* The code is fairly clean.

> On Windows (where shebang processing is handled by the wrapper exe,
> and is well defined and robust) there should be no change to the
> current behaviour.

Agreed.

[1]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302019457
[2]: https://bitbucket.org/pypa/distlib/pull-requests/31
[3]: https://bitbucket.org/pypa/distlib/pull-requests/32
[4]: https://bitbucket.org/pypa/distlib/pull-requests/33
[5]: https://bitbucket.org/pypa/distlib/pull-requests/31#comment-29795586
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore
On 20 May 2017 at 17:36, Steve Dower  wrote:
> In general, since most subprocesses (at least on Windows) do not have
> customizable encodings, the tool that launches them needs to know what the
> encoding is. Since we don't live in a Python 3.6 world quite yet, that means
> the tool should read raw bytes from the compiler and encode them to UTF-8.

Did you spot my point that Visual C produces output that's a mixture
of OEM and ANSI codepages?

The example I used was:

OEM code page 850, ANSI codepage 1252 (standard British English Windows)

Visual Studio 2015

cl a£b >output.file

The output uses CP850 (in the cl error message) and CP1252 (in the
link error) for the £ sign.

When run from the command line without redirection, the output is in a
consistent encoding. It's only when you redirect the output (I
redirected to a file, I assume a pipe would be the same) that you get
the problem.

I'd be very surprised if build tool developers got this sort of edge
case correct without at least some guidance from the PEP on the sorts
of things they need to consider. You suggest "read raw bytes and
encode them to UTF-8" - but you don't encode bytes, you encode
strings, so you still need to convert those bytes to a string first,
and there's no encoding you can reliably use for this. You need to use
"errors=replace" to ensure you can handle inconsistently encoded data
without getting an exception.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore
On 20 May 2017 at 19:36, Steve Dower  wrote:
>
>> - As a lazy developer, I don't want to read stdout/stderr from a
>> subprocess only to spit it back to my own stdout/stderr. I'd much rather
>> just launch the subprocess and let it use the same stdout/stderr as my
>> build tool.
>
>
> One of the open issues against distutils is that it does this. We can allow
> it, but a well-defined tool should capture the output and pass it to the UI
> component rather than bypassing the UI component.

I'm a little concerned if we're going to end up with a proposal that
means that distutils is in violation of the spec unless this issue is
fixed. I'm not sure if that's where we're headed, but I wanted to be
clear here - is PEP 517 intended to encompass distutils/setuptools, or
are we treating them as a legacy case, that pip should special-case?

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Finally making distlib handle spaces

2017-05-20 Thread Paul Moore
On 20 May 2017 at 15:45, Radon Rosborough  wrote:
> ===
> Justifications and counterarguments
> ===
[...]

While you make a good case countering the issues you quote, I think
there's an important issue you haven't addressed, and I'd like to hear
your comments on it:

The current behaviour fails, as you note, but it does so in a
"standard" way - shebang behaviour (and its limitations) is
well-known. At the moment, your proposal is just to use "an
alternative" launch process. Without a specific proposal, it's
impossible to judge whether the solution is better than the current
situation. And indeed, any proposed solution needs to demonstrate that
it doesn't have security vulnerabilities or other issues that make it
just as much a problem as the status quo (if not more).

For example, I would have thought that "#!/usr/bin/env sh" runs the
risk of picking up a malicious sh executable injected into the user's
PATH. Also, different systems use different sh implementations - so
care would need to be taken to only code in the lowest common
denominator syntax.

I suggest that the next step needs to be to propose a specific
implementation of the wrapper. The specifics can then be debated, but
you'd need a pretty solid proposal - otherwise the discussion will
descend into multiple proposals and bikeshedding, and is likely to
stall without achieving anything.

One other point, which I'd hope is obvious. On Windows (where shebang
processing is handled by the wrapper exe, and is well defined and
robust) there should be no change to the current behaviour.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower

On 20May2017 1011, Thomas Kluyver wrote:

On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote:

In general, since most subprocesses (at least on Windows) do not have
customizable encodings, the tool that launches them needs to know what
the encoding is. Since we don't live in a Python 3.6 world quite yet,
that means the tool should read raw bytes from the compiler and encode
them to UTF-8.


I half agree, but:
- Build tools may not 100% know what encoding output will be produced,
especially if the developer can supply a custom command for the build
tool to run.


In this case, the whole thing breaks down anyway. UI can't be expected 
to reliably display text from an unknown encoding - at some point it has 
to be forced into a known quantity, and IMHO the code closest to the 
tool should do it.



- It's possible for data on a pipe to be binary data with no meaning as
text.


Sure, but it cannot be rendered unless you choose an encoding. All you 
can do is dump it to a file (and let a file editor choose an encoding).



- As a lazy developer, I don't want to read stdout/stderr from a
subprocess only to spit it back to my own stdout/stderr. I'd much rather
just launch the subprocess and let it use the same stdout/stderr as my
build tool.


One of the open issues against distutils is that it does this. We can 
allow it, but a well-defined tool should capture the output and pass it 
to the UI component rather than bypassing the UI component.



So I think it's most practical to recommend that build tools produce
UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide
how far they go to comply with that.


Require that build tools either send UTF-8 to the UI component, or write 
bytes to a file and call it a build output. I see no benefit in 
requiring both the build tool and the UI tool to guess what the text 
encoding is.


Cheers,
Steve
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Introducing PyPIContents

2017-05-20 Thread Luis Alejandro Martínez Faneyth
Hi Thomas,

2017-05-20 13:23 GMT-04:00 Thomas Kluyver :

> Hi Luis,
>
> Awesome, thanks for this :-). It was me posting before about indexing PyPI.
>
> I'm intrigued: how do you keep it up to date using Travis? When I looked
> into this, I was pretty sure you need to download every package to index
> it. Do you have some way to only download the new releases? Or is Travis
> able to download every package every day? Or have you found another way
> round it?
>

​I divided the index processing alphabetically, so that each letter ​is
processed in a separate travis job. I also placed memory and time limits to
avoid abusing Travis. The first run it has to download each package until
it reaches the maximum time limit for each job, which is 40min. The next
time, the script will only process packages that have been updated since
the last run.



> Does the index only include the latest version of each package, or does it
> also include older versions? The wifi on the train I'm on at the moment
> isn't fast enough to download 60 MB to find out. ;-)
>

​It only includes the current versions.​



>
> Does your indexing tool prefer to use wheels or sdists? Is it capable of
> using either for packages which don't have both available? Do you do
> anything to cope with modules which may be included for one platform but
> not another?
>

​It supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2'​]
formats, and it extracts the data using any available.

I wasn't aware of the fact that some modules may be on one platform and not
in another. I guess there's room for improvement.



>
> I'm excited to see someone actually doing this!
>

​Thank you. I made this because I wanted to have an app that guessed python
dependencies from code by scaning module imports and ​then looking up the
Index. That app is called Pip Sala Bim and you can check it out here:

https://github.com/LuisAlejandro/pipsalabim



>
>
> Thomas
>
>
> On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote:
>
> Hi everyone,
>
> I'm new to this list but I've been reading some threads in the archive.
>
> Around february, an idea about indexing modules from PyPI packages was
> brought up. I've been working on something similar for quite a while.
>
> PyPIContents is an index of PyPI packages that lists its modules and
> command line scripts in JSON format, like this:
>
>
> [
> ...
>
> "1337": {
> "cmdline": [],
> "modules": [
> "1337",
> "1337.1337"
> ],
> "version": "1.0.0"
> },
>
> ...
>
> ]
>
>
> You can check it out here:
>
> https://github.com/LuisAlejandro/pypicontents
>
> And some use cases:
>
> https://github.com/LuisAlejandro/pypicontents#use-cases
>
> The actual index lives here, its around 60MB:
>
> https://raw.githubusercontent.com/LuisAlejandro/pypicontents
> /contents/pypi.json
>
> Is updated daily with the help of Travis:
>
> https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml
>
> Anyway, I hope is useful and I'll be around for any comments or questions.
>
> Cheers!
>
>
>
> Luis Alejandro Martínez Faneyth
> Blog: http://huntingbears.com.ve
> Github: http://github.com/LuisAlejandro
> Twitter: http://twitter.com/LuisAlejandro
>
> CODE IS POETRY
>
> *___*
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>


-- 
Luis Alejandro Martínez Faneyth
Blog: http://huntingbears.com.ve
Github: http://github.com/LuisAlejandro
Twitter: http://twitter.com/LuisAlejandro

CODE IS POETRY
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Introducing PyPIContents

2017-05-20 Thread Thomas Kluyver
Hi Luis,

Awesome, thanks for this :-). It was me posting before about
indexing PyPI.
I'm intrigued: how do you keep it up to date using Travis? When I
looked into this, I was pretty sure you need to download every package
to index it. Do you have some way to only download the new releases? Or
is Travis able to download every package every day? Or have you found
another way round it?
Does the index only include the latest version of each package, or does
it also include older versions? The wifi on the train I'm on at the
moment isn't fast enough to download 60 MB to find out. ;-)
Does your indexing tool prefer to use wheels or sdists? Is it capable of
using either for packages which don't have both available? Do you do
anything to cope with modules which may be included for one platform but
not another?
I'm excited to see someone actually doing this!

Thomas


On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote:> Hi 
everyone,
> 
> I'm new to this list but I've been reading some threads in the
> archive.> 
> Around february, an idea about indexing modules from PyPI packages was
> brought up. I've been working on something similar for quite a while.> 
> PyPIContents is an index of PyPI packages that lists its modules and
> command line scripts in JSON format, like this:> 
> 
> [
> ..."1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ],
> "version": "1.0.0" }, ... ]
>
>> You can check it out here:
> 
> https://github.com/LuisAlejandro/pypicontents
> 
> And some use cases:
> 
> https://github.com/LuisAlejandro/pypicontents#use-cases
> 
> The actual index lives here, its around 60MB:
> 
> https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.json>
>  
> Is updated daily with the help of Travis:
> 
> https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml> 
> Anyway, I hope is useful and I'll be around for any comments or
> questions.> 
> Cheers!
> 
> 
> 
> Luis Alejandro Martínez Faneyth
> Blog: http://huntingbears.com.ve
> Github: http://github.com/LuisAlejandro
> Twitter: http://twitter.com/LuisAlejandro
> 
> CODE IS POETRY
> 
> _
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Thomas Kluyver
On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote:
> I'll probably read the PEP closely and see that this is entirely 
> incorrect, but if it's right:
> 
> * encoding for text between the build UI and build tool should just be 
> specified once for all platforms (i.e. use UTF-8).

+1

> * encoding for text between build tool and the compiler depends on the 
> compiler
> 
> In general, since most subprocesses (at least on Windows) do not have 
> customizable encodings, the tool that launches them needs to know what 
> the encoding is. Since we don't live in a Python 3.6 world quite yet, 
> that means the tool should read raw bytes from the compiler and encode 
> them to UTF-8.

I half agree, but:
- Build tools may not 100% know what encoding output will be produced,
especially if the developer can supply a custom command for the build
tool to run.
- It's possible for data on a pipe to be binary data with no meaning as
text.
- As a lazy developer, I don't want to read stdout/stderr from a
subprocess only to spit it back to my own stdout/stderr. I'd much rather
just launch the subprocess and let it use the same stdout/stderr as my
build tool.

So I think it's most practical to recommend that build tools produce
UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide
how far they go to comply with that.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Finally making distlib handle spaces

2017-05-20 Thread Radon Rosborough
Hi all,

Since at least 2011, virtualenv has not supported spaces in paths.
This has bitten many people, including myself, and caused numerous
issues over the years [1] [2] [3] [4] [5] [6] [7].

However, as was discussed in [8], the issue lies not with virtualenv
but with distlib, via pip. It would be possible for pip to use the
existing distlib interface to hack around the problem, but I believe
the current behavior of distlib is erroneous when it comes to spaces
in paths. I therefore believe it would be more appropriate to fix the
problem in distlib.

Two separate patches [9] [10] that solve the problem in distlib were
posted in January by Harald Nordgren. However, they were declined
pending a discussion on distutils-sig [11]. As far as I can tell, no
such discussion was ever started. However, the issue remains, and we
have a clear solution proposal to consider, so I'd like to kick it off
now. In the remainder of this email, I'll explain the problem and
surrounding context in detail, and why I think the solution proposed
by Harald (or some variation) is a good path forward for distlib. I
look forward to hearing your thoughts on the matter.

==
The behavior of virtualenv
==

The following is written for:

$ python --version
Python 2.7.13
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
$ virtualenv --version
15.1.0

Creating a virtualenv is done as follows:

$ virtualenv venv
New python executable in /private/tmp/path with spaces/venv/bin/python2.7
Also creating executable in /private/tmp/path with spaces/venv/bin/python
Installing setuptools, pip, wheel...done.

This creates a directory structure looking as follows under the venv
directory:

├── bin
│  ├── activate
│  ├── activate.csh
│  ├── activate.fish
│  ├── activate_this.py
│  ├── easy_install
│  ├── easy_install-2.7
│  ├── pip
│  ├── pip2
│  ├── pip2.7
│  ├── python -> ./bin/python2.7
│  ├── python-config
│  ├── python2 -> ./bin/python2.7
│  ├── python2.7
│  └── wheel
├── include
│  └── ...
├── lib
│  └── ...
└── ...

The idea is that one can call the pip and python executables
inside the virtualenv, instead of the system ones. Like so:

$ venv/bin/python --version
Python 2.7.13
$ venv/bin/pip --version
zsh: venv/bin/pip: bad interpreter: "/private/tmp/path: no such
file or directory

Unfortunately, as you can see, pip doesn't work at all! Why does this
happen? While the python executable is a native binary, pip is
actually just a Python script, which is specified to be run by the
accompanying virtualenv python executable. Here are the contents:

#!"/private/tmp/path with spaces/venv/bin/python2.7"

# -*- coding: utf-8 -*-
import re
import sys

from pip import main

if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
sys.exit(main())

The issue is that the python binary is specified using a shebang,
which is known [12] [13] to be fragile, OS-dependent, and error-prone
regarding paths that are long or that contain spaces or non-ASCII
characters. In particular, the quoting of the shebang does not work on
most operating systems, including macOS, which is what I ran this test
on.

==
virtualenv and distlib
==

The issue is complicated by the fact that there are several different
libraries at play. To perform the installation of pip and wheel into
the virtualenv, virtualenv calls into pip [14]. The 'pip install'
command then uses the subroutine library 'wheel.py', which generates
the stub scripts using distlib's ScriptMaker [15]. It is actually
distlib which generates the shebang, although this can be overridden
by setting the 'executable' property of the ScriptMaker object [16]
[17]. Any patch to fix the virtualenv problem would therefore need to
be in either pip (as a consumer of the shebang-generation interface)
or distlib (as the provider of that interface). The problem cannot be
addressed by virtualenv without doing something like using pip/distlib
to generate the scripts and then fixing them after the fact (this has
been proposed [26] [29], but I consider it a hack).

==
Proposed solutions
==

There has been extended discussion about this issue, especially in
[2]. Essentially, the solutions proposed fall into four categories:

(1) Don't change anything; end users can work around the issue. For
example, they can place their virtualenvs in a different directory
than their project, or change their username to avoid having
spaces or non-ASCII characters.

(2) Don't fix the bug, but add a warning to virtualenv. If we
absolutely can't fix the bug (which I strongly believe not to be
the case), then this would be the next best thing to do. See [27]
[30] [31].

(3) Att

[Distutils] Introducing PyPIContents

2017-05-20 Thread Luis Alejandro Martínez Faneyth
Hi everyone,

I'm new to this list but I've been reading some threads in the archive.

Around february, an idea about indexing modules from PyPI packages was
brought up. I've been working on something similar for quite a while.

PyPIContents is an index of PyPI packages that lists its modules and
command line scripts in JSON format, like this:


[
...

"1337": {
"cmdline": [],
"modules": [
"1337",
"1337.1337"
],
"version": "1.0.0"
},

...

]


You can check it out here:

https://github.com/LuisAlejandro/pypicontents

And some use cases:

https://github.com/LuisAlejandro/pypicontents#use-cases

The actual index lives here, its around 60MB:

https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.json

Is updated daily with the help of Travis:

https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml

Anyway, I hope is useful and I'll be around for any comments or questions.

Cheers!


Luis Alejandro Martínez Faneyth
Blog: http://huntingbears.com.ve
Github: http://github.com/LuisAlejandro
Twitter: http://twitter.com/LuisAlejandro

CODE IS POETRY
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Steve Dower

On 20May2017 0820, Nick Coghlan wrote:

Good point regarding the fact that the Windows 16-bit APIs only come
into play for interactive sessions (even in 3.6+), while for PEP 517
we're specifically interested in the 8-bit pipes used to communicate
with build subprocesses launched by an installation tool.


I need to catch up on the PEP (and thanks Brett for alerting me to the 
thread), but this comment in particular cements the mental diagram I 
have right now:


(build UI) <--> (build tool) <--> (compiler)
( Python ) <--> (  Python  ) <--> (anything)

I'll probably read the PEP closely and see that this is entirely 
incorrect, but if it's right:


* encoding for text between the build UI and build tool should just be 
specified once for all platforms (i.e. use UTF-8).
* encoding for text between build tool and the compiler depends on the 
compiler


In general, since most subprocesses (at least on Windows) do not have 
customizable encodings, the tool that launches them needs to know what 
the encoding is. Since we don't live in a Python 3.6 world quite yet, 
that means the tool should read raw bytes from the compiler and encode 
them to UTF-8.


The encoding between the tool and UI is essentially irrelevant - the UI 
is going to transform the data anyway for display, and the tool is going 
to have to transform it from the compilation tools, so the best we can 
do is pick the most likely encoding to avoid too many operations. UTF-8 
is probably that.


That's my 0.02AUD based on a vague memory of the PEP and this thread. As 
I get time today (at PyCon) to read up on it I may post amendments, but 
in general I'm +100 on "just pick an encoding and make the 
implementations transcode".


Cheers,
Steve

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Brett Cannon
On Fri, May 19, 2017, 09:20 Thomas Kluyver,  wrote:

> On Fri, May 19, 2017, at 05:17 PM, Paul Moore wrote:
> > On 19 May 2017 at 16:53, Daniel Holth  wrote:
> > > Congrats on getting 518 in.
> >
> > Agreed, by the way. That's a big step!
>
> Thanks both. It does feel like an achievement. :-)
>

As it should! Thanks for bringing the PEP to life!

-brett


___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan
Good point regarding the fact that the Windows 16-bit APIs only come
into play for interactive sessions (even in 3.6+), while for PEP 517
we're specifically interested in the 8-bit pipes used to communicate
with build subprocesses launched by an installation tool.

On 20 May 2017 at 19:11, Paul Moore  wrote:
> The bigger question, though, is to what extent we want to mandate that
> build tools that run external tools such as compilers take
> responsibility for the encoding of the output of those tools (rather
> than simply passing the output through to the output stream
> unmodified). And if we do want to, whether we want to allow an
> exception for setuptools/distutils.
>
> Also, a question regarding Unix - do we really want to mandate UTF-8
> even if the system locale is set to something else? Won't that mean
> that build tools have the same problem with compilers generating
> output in the encoding the tool wants that we already have on Windows?

Yeah, I think that problem was starting to occur to me, hence the
reference to handling RPM and DEB build environments.

At least for non-Windows systems, I see two possible recommendations:

1. We advise installation tools to use binary streams to communicate
with build tools, and treat the results as opaque binary data. If it
needs to be written out to the installation tool's own streams, then
use the binary level APIs for those interfaces to inject the build
tool output directly, rather than decoding and re-encoding it first.

2. We advise installation tools to adopt a PEP 538 style solution,
where they mostly just trust the result of
locale.getpreferredencoding() *unless*
"codecs.lookup(locale.getpreferredencoding()).name == 'ascii'". In the
latter case, we'd advise them to set LC_CTYPE (and potentially LANG)
appropriately for the running OS. Regardless of whether or not that
locale coercion was needed, we'd recommend setting "replace" or
"backslashreplace" when decoding the stream output from the
subprocess.

At the specification level, I think option 1 probably makes the most
sense - we'd be advising insallation tools that they're free to kick
any mojibake problems further down the automation pipeline if they
don't want to worry about it. It's also the only one of the two
recommendations we can readily make cross platform.

At a quality-of-implementation level, there's a lot of potential value
in option 2 (at least on non-Windows systems) - we just wouldn't
require or recommend it at the level of the interoperability
specifications.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Paul Moore
On 20 May 2017 at 09:03, Thomas Kluyver  wrote:
> On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote:
>> * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X),
>> build systems SHOULD emit UTF-8 encoded output
>> * on platforms with 16-bit standard streams (e.g. Windows), build
>> systems SHOULD emit UTF-16-LE encoded output
>
> I'm quite prepared to accept that I'm mistaken, but my understanding is
> that *standard streams* are 8-bit on Windows as well. The 16-bit thing
> that Python 3.6 does, as I understand it, is to bypass standard streams
> when it detects that they're connected to a console, and use a Windows
> API call to write text to the console directly as UTF-16.
>
> If so, when stdout/stderr are pipes, which I assume is how pip captures
> the output from build processes, there's no particular reason to send
> UTF-16 data just because it's Windows.

That's my understanding too. The standard streams are still byte
streams with an encoding. It's just that the underlying IO when the
final destination is the console, is done by the Windows Unicode APIs.
Because of this, when the output is the console the stream can accept
any unicode character and so an encoding of UTF8 is specified (and
yes, AIUI there is a translation Unicode string -> UTF-8 bytes ->
Unicode console API). For non-console output, though, the standard
streams are still byte streams and the platform behaviour is
respected, so we use the ANSI codepage (calling this the platform
standard glosses over the fact that there are two standard codepages,
ANSI and OEM, and tools don't always make the same choice when faced
with piped output). Long story short, UTF-16 is irrelevant here.

The docs for 3.6 say "Under Windows, if the stream is interactive
(that is, if its isatty() method returns True), the console codepage
is used, otherwise the ANSI code page". This is out of date (it was
true for 3.5 and earlier). In 3.6+ utf-8 is used for interactive
streams rather than the console codepage:

>py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)"
utf-8
>py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)" >$null
cp1252

The bigger question, though, is to what extent we want to mandate that
build tools that run external tools such as compilers take
responsibility for the encoding of the output of those tools (rather
than simply passing the output through to the output stream
unmodified). And if we do want to, whether we want to allow an
exception for setuptools/distutils.

Also, a question regarding Unix - do we really want to mandate UTF-8
even if the system locale is set to something else? Won't that mean
that build tools have the same problem with compilers generating
output in the encoding the tool wants that we already have on Windows?

My feeling is:

1. Build systems SHOULD emit output encoded in the preferred locale
encoding (normally UTF-8 on Unix, ANSI on Windows).
2. Build systems should ideally check the encoding used by external
tools that they run and transcode to the correct encoding if necessary
- but this is a quality of implementation matter.
3. Install tools MUST NOT fail if build tools produce output with the
wrong encoding, but MUST correctly reproduce build tool output if the
build tools do produce the right encoding.

My biggest concern with this is that I believe that Visual C produces
output in the OEM codepage even when output to a pipe. Actually I just
did some experiments (VS 2015), and it's even worse than that. The
compiler (cl) seems to use the OEM code page when writing to a pipe,
but the linker uses the ANSI code page. This means that a command like
"cl a£bc" produces output on (a piped) stdout that contains mixed
encodings. Given this situation, I think we have to simply give up and
take the view that the Visual C tools are simply broken in this
regard, and we shouldn't worry about them. So I'm inclined therefore
to drop point (2) from the 3 above.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Thomas Kluyver
On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote:
> * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X),
> build systems SHOULD emit UTF-8 encoded output
> * on platforms with 16-bit standard streams (e.g. Windows), build
> systems SHOULD emit UTF-16-LE encoded output

I'm quite prepared to accept that I'm mistaken, but my understanding is
that *standard streams* are 8-bit on Windows as well. The 16-bit thing
that Python 3.6 does, as I understand it, is to bypass standard streams
when it detects that they're connected to a console, and use a Windows
API call to write text to the console directly as UTF-16.

If so, when stdout/stderr are pipes, which I assume is how pip captures
the output from build processes, there's no particular reason to send
UTF-16 data just because it's Windows.

Thomas

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517 - specifying build system in pyproject.toml

2017-05-20 Thread Nick Coghlan
On 20 May 2017 at 01:16, Thomas Kluyver  wrote:
> On Fri, May 19, 2017, at 03:41 PM, Paul Moore wrote:
>> Can we specify what encoding the informational text must be written
>> in?
>
> Sure, that makes sense. What about:
>
> All hooks are run with working directory set to the root of the source
> tree, and MAY print arbitrary informational text on stdout and stderr.
> This text SHOULD be UTF-8 encoded, but as building may invoke other
> processes, install tools MUST NOT fail if the data they receive is not
> valid UTF-8; though in this case the display of the output may be
> corrupted.
>
> Do we also want to recommend that install tools set
> PYTHONIOENCODING=utf-8 when invoking build tools? Or leave this up to
> the build tools?

Setting PYTHONIOENCODING=utf-8:strict would potentially fail the
"don't fail hard on misencoded output" requirement, and setting
anything else is dubious from a potential data loss or compatibility
point of view (as there's no "surrogateescape" error handler in Python
2).

For use cases like distro package building, we'd also like to inherit
the surrounding build environment, so explictly requiring installation
tools to alter it at the Python level doesn't strike me as ideal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig