Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 21 May 2017 at 02:36, Steve Dower wrote: > On 20May2017 0820, Nick Coghlan wrote: >> >> Good point regarding the fact that the Windows 16-bit APIs only come >> into play for interactive sessions (even in 3.6+), while for PEP 517 >> we're specifically interested in the 8-bit pipes used to communicate >> with build subprocesses launched by an installation tool. > > > I need to catch up on the PEP (and thanks Brett for alerting me to the > thread), but this comment in particular cements the mental diagram I have > right now: > > (build UI) <--> (build tool) <--> (compiler) > ( Python ) <--> ( Python ) <--> (anything) > > I'll probably read the PEP closely and see that this is entirely incorrect, > but if it's right: > > * encoding for text between the build UI and build tool should just be > specified once for all platforms (i.e. use UTF-8). > * encoding for text between build tool and the compiler depends on the > compiler Alas, it isn't quite that simple. Let's take the current de facto standard case: (user console/CI build log) <-> pip <-> setup.py (distutils/setuptools) <-> 3rd party tool Key usability feature: * when requested, informational messages from 3rd party tools SHOULD be made available to the end user for debugging purposes Ideal outcome: * everything that makes it to the user console or CI build log is readable by the end user Essential requirement: * encoding problems in informational messages emitted by 3rd party tools MUST NOT cause the build to fail Now, the easiest way to handle the essential requirement as the author of an installation or build tool is to choose not to deal with it: instead, you just treat the output from further downstream as opaque binary data, and let the user console/CI build log layer deal with any encoding problems as they see fit. You may end up with some build failures that are a pain to debug because you're getting nonsense from the build pipeline, but you won't fail your build *because* some particular build tool emitted improperly encoded nonsense. That all changes if we *require* UTF-8 on the link between the installation tool (e.g. pip) and the build tool (e.g. setup.py). If we do that: * the installation tool can't just pass along build tool output to the user console or CI build log any more, it has a nominal obligation to try to interpret it as UTF-8 * the build tool (or build tool shim) can't just pass along 3rd party tool output to the installation tool any more, it has a nominal obligation to try to get it to emit UTF-8 Now, *particular* installation and build tools may want to strongly encourage the use of UTF-8 in an effort to get closer to the ideal outcome, but that isn't the key objective of PEP 517: the key objective of PEP 517 is to make it easier to use *general purpose* build systems that happen to be implemented in Python (like waf, scons, and meson) to handle complex build scenarios, while also allowing the use of simpler Python-only build systems (like flit) for distribution of pure Python projects. That said, the PEP *could* explicitly define a short list of behaviours that we consider reasonable in an installation tool: 1. Treat the informational output from the build tool as an opaque binary stream 2. Treat the informational output from the build tool as a text stream encoded using locale.getpreferredencoding(), and decode it using the backslashreplace error handler 3. Treat the informational output from the build tool as a UTF-8 encoded text stream, and decode it using the backslashreplace error handler We'd just need to caveat the latter two options with the fact that they'll give you a cryptic error message on Python 3.4 and earlier (including Python 2): >>> b"\xf0\x01\x02\x03".decode("utf-8", "backslashreplace") Traceback (most recent call last): File "", line 1, in File "/home/ncoghlan/devel/py27/Lib/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) TypeError: don't know how to handle UnicodeDecodeError in error callback I had to look that up on Stack Overflow myself, but what it's trying to say is that until Python 3.5, "backslashreplace" only worked for encoding, not for decoding. That means that for earlier versions, you'd need to define your own custom error handler as described in http://stackoverflow.com/questions/25442954/how-should-i-decode-bytes-using-ascii-without-losing-any-junk-bytes-if-xmlch/25443356#25443356 Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Finally making distlib handle spaces
On 21 May 2017 at 07:29, Radon Rosborough wrote: >> The current behaviour fails, as you note, but it does so in a >> "standard" way - shebang behaviour (and its limitations) is >> well-known. > > I agree with this, but in my opinion the shebang is simply an > implementation detail of virtualenv. I would like to quote @JDLH from > [1]: "There is nothing about the value provided by virtualenv that > demands it use the shebang." If the shebang were fundamentally > necessary to provide the functionality of virtualenv, it would make > sense for virtualenv to inherit the shebang's restrictions. But this > is not the case, so I think that the shebang should be considered an > implementation detail that the end user should not need to be aware > of. I agree with this way of looking at the problem, so my perspective would be: 1. Don't change anything on Windows (since that already uses the custom 'py' dispatcher) 2. Don't change anything for cases where platform provided shebang dispatch is trusted to be correct (i.e. no quoting of the shebang line is needed) 3. Change the cases that currently quote the shebang line to instead invoke a custom dispatch script running in /bin/sh I also agree that distlib is the right level to implement the change - this isn't about people wanting custom behaviour, it's about distlib's default dispatch mechanism having been found not to work in certain cases, so it makes sense to automatically switch to an alternative that *does* work. The custom dispatcher doesn't need to solve 100% of the currently failing cases - it just needs to solve some of them, and provide a foundation for future iterations on the design and implementation to handle more. Cheers, Nick. P.S. I was originally going to ask "Can we use Python to implement the dispatch instead?", but then realised there are actually lots of messy complications with that around getting program metadata and environmental details right that sh deals with natively. So /bin/sh is a better way to go. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
> On May 20, 2017, at 4:05 PM, Paul Moore wrote: > > I'm a little concerned if we're going to end up with a proposal that > means that distutils is in violation of the spec unless this issue is > fixed. I'm not sure if that's where we're headed, but I wanted to be > clear here - is PEP 517 intended to encompass distutils/setuptools, or > are we treating them as a legacy case, that pip should special-case? I don’t think distutils/setuptools are going to be compatible out of the box anyways, because it’s API is tied to setup.py. Whatever adapter is written to adapt it to PEP 517 can handle any semantic differences as well. — Donald Stufft ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20May2017 1315, Paul Moore wrote: On 20 May 2017 at 17:36, Steve Dower wrote: In general, since most subprocesses (at least on Windows) do not have customizable encodings, the tool that launches them needs to know what the encoding is. Since we don't live in a Python 3.6 world quite yet, that means the tool should read raw bytes from the compiler and encode them to UTF-8. Did you spot my point that Visual C produces output that's a mixture of OEM and ANSI codepages? [SNIP] Yes, and it's a perfect example of why the MSVC-specific wrapper should be the one to deal with tool encodings. If you forward unencoded bytes like this back to the UI, it will have to deal with the mixed encoding. I'd be very surprised if build tool developers got this sort of edge case correct without at least some guidance from the PEP on the sorts of things they need to consider. You suggest "read raw bytes and encode them to UTF-8" - but you don't encode bytes, you encode strings, so you still need to convert those bytes to a string first, and there's no encoding you can reliably use for this. You need to use "errors=replace" to ensure you can handle inconsistently encoded data without getting an exception. The "read raw bytes and [transcode] them" comment was meant to be that sort of help. I didn't go as far as writing `output.decode(output_encoding, errors="replace").encode("utf-8", errors="replace")`, but that's basically what I meant to imply. The build tool developer is the *only* developer who can get this right, and if they can't, then they have to figure out the most appropriate way to work around the fact that they can't. As for defining distutils as incompatible with the PEP, I'm okay with that. Updating distutils to use subprocess for launching tools rather than spawnv would be a very good start (and likely a good contribution for a new contributor), but allowing build tools to continue to be written badly is not worthwhile. Cheers, Steve ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Introducing PyPIContents
On Sat, May 20, 2017, at 07:29 PM, Luis Alejandro Martínez Faneyth wrote:> It supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2'] > formats, and it extracts the data using any available. Nice! If there are multiple of those formats present, does it get the data from just one? Or does it get data from all of them and combine it somehow? > > I wasn't aware of the fact that some modules may be on one platform > and not in another. I guess there's room for improvement. It probably doesn't matter for most cases, but since setup.py runs arbitrary code, it's possible for it to install different modules in different situations - or even select modules at random, if you really want to confuse tools like this. ;-) This is why my own efforts at indexing focused on wheels - you can be sure of exactly what a wheel contains. My wheel-indexing tool 'wheeldex' is here, if there's any code or ideas there that you can use:https://github.com/takluyver/wheeldex > Thank you. I made this because I wanted to have an app that guessed > python dependencies from code by scaning module imports and then > looking up the Index. That app is called Pip Sala Bim and you can > check it out here:> > https://github.com/LuisAlejandro/pipsalabim Neat, that's precisely one of the use cases I was thinking of for an index. The other thing I'm interested in is providing an interface to install modules by their import name rather than their PyPI name; I think your index should work for that as well. I'll dig into the code of both PyPIContents and Pip Sala Bim more soon. Thanks, Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Finally making distlib handle spaces
> The current behaviour fails, as you note, but it does so in a > "standard" way - shebang behaviour (and its limitations) is > well-known. I agree with this, but in my opinion the shebang is simply an implementation detail of virtualenv. I would like to quote @JDLH from [1]: "There is nothing about the value provided by virtualenv that demands it use the shebang." If the shebang were fundamentally necessary to provide the functionality of virtualenv, it would make sense for virtualenv to inherit the shebang's restrictions. But this is not the case, so I think that the shebang should be considered an implementation detail that the end user should not need to be aware of. > At the moment, your proposal is just to use "an alternative" launch > process. Without a specific proposal, it's impossible to judge > whether the solution is better than the current situation. We already have three specific patches which provide alternative launch processes: [2], [3], and [4]. I feel like that should be specific enough to start a discussion. In fact, Vinay specifically requested a discussion about [2] be raised on distutils-sig [5]. The only reason that no action has been taken is that nobody started that discussion (until now). > I would have thought that "#!/usr/bin/env sh" runs the risk of > picking up a malicious sh executable injected into the user's PATH. That's certainly a valid concern. Does this happen in the real world? I feel like if you have a malicious sh executable on your PATH, you're going to have a lot more problems than just from virtualenv. But this is a good reason that we may want to restrict the patch to only take effect when using the shebang directly would fail. > Also, different systems use different sh implementations - so care > would need to be taken to only code in the lowest common denominator > syntax. Can we assume POSIX compatibility? Even if not, we're not doing anything complicated, only passing some arguments to a command. Surely that can be done in pretty much any shell one can find. > multiple proposals and bikeshedding Although we have three implementations, my personal preference is for [4]. This is because: * It avoids the need for creating new files. * It only takes effect when necessary (i.e. when the shebang won't work). * The code is fairly clean. > On Windows (where shebang processing is handled by the wrapper exe, > and is well defined and robust) there should be no change to the > current behaviour. Agreed. [1]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302019457 [2]: https://bitbucket.org/pypa/distlib/pull-requests/31 [3]: https://bitbucket.org/pypa/distlib/pull-requests/32 [4]: https://bitbucket.org/pypa/distlib/pull-requests/33 [5]: https://bitbucket.org/pypa/distlib/pull-requests/31#comment-29795586 ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20 May 2017 at 17:36, Steve Dower wrote: > In general, since most subprocesses (at least on Windows) do not have > customizable encodings, the tool that launches them needs to know what the > encoding is. Since we don't live in a Python 3.6 world quite yet, that means > the tool should read raw bytes from the compiler and encode them to UTF-8. Did you spot my point that Visual C produces output that's a mixture of OEM and ANSI codepages? The example I used was: OEM code page 850, ANSI codepage 1252 (standard British English Windows) Visual Studio 2015 cl a£b >output.file The output uses CP850 (in the cl error message) and CP1252 (in the link error) for the £ sign. When run from the command line without redirection, the output is in a consistent encoding. It's only when you redirect the output (I redirected to a file, I assume a pipe would be the same) that you get the problem. I'd be very surprised if build tool developers got this sort of edge case correct without at least some guidance from the PEP on the sorts of things they need to consider. You suggest "read raw bytes and encode them to UTF-8" - but you don't encode bytes, you encode strings, so you still need to convert those bytes to a string first, and there's no encoding you can reliably use for this. You need to use "errors=replace" to ensure you can handle inconsistently encoded data without getting an exception. Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20 May 2017 at 19:36, Steve Dower wrote: > >> - As a lazy developer, I don't want to read stdout/stderr from a >> subprocess only to spit it back to my own stdout/stderr. I'd much rather >> just launch the subprocess and let it use the same stdout/stderr as my >> build tool. > > > One of the open issues against distutils is that it does this. We can allow > it, but a well-defined tool should capture the output and pass it to the UI > component rather than bypassing the UI component. I'm a little concerned if we're going to end up with a proposal that means that distutils is in violation of the spec unless this issue is fixed. I'm not sure if that's where we're headed, but I wanted to be clear here - is PEP 517 intended to encompass distutils/setuptools, or are we treating them as a legacy case, that pip should special-case? Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Finally making distlib handle spaces
On 20 May 2017 at 15:45, Radon Rosborough wrote: > === > Justifications and counterarguments > === [...] While you make a good case countering the issues you quote, I think there's an important issue you haven't addressed, and I'd like to hear your comments on it: The current behaviour fails, as you note, but it does so in a "standard" way - shebang behaviour (and its limitations) is well-known. At the moment, your proposal is just to use "an alternative" launch process. Without a specific proposal, it's impossible to judge whether the solution is better than the current situation. And indeed, any proposed solution needs to demonstrate that it doesn't have security vulnerabilities or other issues that make it just as much a problem as the status quo (if not more). For example, I would have thought that "#!/usr/bin/env sh" runs the risk of picking up a malicious sh executable injected into the user's PATH. Also, different systems use different sh implementations - so care would need to be taken to only code in the lowest common denominator syntax. I suggest that the next step needs to be to propose a specific implementation of the wrapper. The specifics can then be debated, but you'd need a pretty solid proposal - otherwise the discussion will descend into multiple proposals and bikeshedding, and is likely to stall without achieving anything. One other point, which I'd hope is obvious. On Windows (where shebang processing is handled by the wrapper exe, and is well defined and robust) there should be no change to the current behaviour. Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20May2017 1011, Thomas Kluyver wrote: On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote: In general, since most subprocesses (at least on Windows) do not have customizable encodings, the tool that launches them needs to know what the encoding is. Since we don't live in a Python 3.6 world quite yet, that means the tool should read raw bytes from the compiler and encode them to UTF-8. I half agree, but: - Build tools may not 100% know what encoding output will be produced, especially if the developer can supply a custom command for the build tool to run. In this case, the whole thing breaks down anyway. UI can't be expected to reliably display text from an unknown encoding - at some point it has to be forced into a known quantity, and IMHO the code closest to the tool should do it. - It's possible for data on a pipe to be binary data with no meaning as text. Sure, but it cannot be rendered unless you choose an encoding. All you can do is dump it to a file (and let a file editor choose an encoding). - As a lazy developer, I don't want to read stdout/stderr from a subprocess only to spit it back to my own stdout/stderr. I'd much rather just launch the subprocess and let it use the same stdout/stderr as my build tool. One of the open issues against distutils is that it does this. We can allow it, but a well-defined tool should capture the output and pass it to the UI component rather than bypassing the UI component. So I think it's most practical to recommend that build tools produce UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide how far they go to comply with that. Require that build tools either send UTF-8 to the UI component, or write bytes to a file and call it a build output. I see no benefit in requiring both the build tool and the UI tool to guess what the text encoding is. Cheers, Steve ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Introducing PyPIContents
Hi Thomas, 2017-05-20 13:23 GMT-04:00 Thomas Kluyver : > Hi Luis, > > Awesome, thanks for this :-). It was me posting before about indexing PyPI. > > I'm intrigued: how do you keep it up to date using Travis? When I looked > into this, I was pretty sure you need to download every package to index > it. Do you have some way to only download the new releases? Or is Travis > able to download every package every day? Or have you found another way > round it? > I divided the index processing alphabetically, so that each letter is processed in a separate travis job. I also placed memory and time limits to avoid abusing Travis. The first run it has to download each package until it reaches the maximum time limit for each job, which is 40min. The next time, the script will only process packages that have been updated since the last run. > Does the index only include the latest version of each package, or does it > also include older versions? The wifi on the train I'm on at the moment > isn't fast enough to download 60 MB to find out. ;-) > It only includes the current versions. > > Does your indexing tool prefer to use wheels or sdists? Is it capable of > using either for packages which don't have both available? Do you do > anything to cope with modules which may be included for one platform but > not another? > It supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2'] formats, and it extracts the data using any available. I wasn't aware of the fact that some modules may be on one platform and not in another. I guess there's room for improvement. > > I'm excited to see someone actually doing this! > Thank you. I made this because I wanted to have an app that guessed python dependencies from code by scaning module imports and then looking up the Index. That app is called Pip Sala Bim and you can check it out here: https://github.com/LuisAlejandro/pipsalabim > > > Thomas > > > On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote: > > Hi everyone, > > I'm new to this list but I've been reading some threads in the archive. > > Around february, an idea about indexing modules from PyPI packages was > brought up. I've been working on something similar for quite a while. > > PyPIContents is an index of PyPI packages that lists its modules and > command line scripts in JSON format, like this: > > > [ > ... > > "1337": { > "cmdline": [], > "modules": [ > "1337", > "1337.1337" > ], > "version": "1.0.0" > }, > > ... > > ] > > > You can check it out here: > > https://github.com/LuisAlejandro/pypicontents > > And some use cases: > > https://github.com/LuisAlejandro/pypicontents#use-cases > > The actual index lives here, its around 60MB: > > https://raw.githubusercontent.com/LuisAlejandro/pypicontents > /contents/pypi.json > > Is updated daily with the help of Travis: > > https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml > > Anyway, I hope is useful and I'll be around for any comments or questions. > > Cheers! > > > > Luis Alejandro Martínez Faneyth > Blog: http://huntingbears.com.ve > Github: http://github.com/LuisAlejandro > Twitter: http://twitter.com/LuisAlejandro > > CODE IS POETRY > > *___* > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > > > > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > > -- Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro CODE IS POETRY ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Introducing PyPIContents
Hi Luis, Awesome, thanks for this :-). It was me posting before about indexing PyPI. I'm intrigued: how do you keep it up to date using Travis? When I looked into this, I was pretty sure you need to download every package to index it. Do you have some way to only download the new releases? Or is Travis able to download every package every day? Or have you found another way round it? Does the index only include the latest version of each package, or does it also include older versions? The wifi on the train I'm on at the moment isn't fast enough to download 60 MB to find out. ;-) Does your indexing tool prefer to use wheels or sdists? Is it capable of using either for packages which don't have both available? Do you do anything to cope with modules which may be included for one platform but not another? I'm excited to see someone actually doing this! Thomas On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote:> Hi everyone, > > I'm new to this list but I've been reading some threads in the > archive.> > Around february, an idea about indexing modules from PyPI packages was > brought up. I've been working on something similar for quite a while.> > PyPIContents is an index of PyPI packages that lists its modules and > command line scripts in JSON format, like this:> > > [ > ..."1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], > "version": "1.0.0" }, ... ] > >> You can check it out here: > > https://github.com/LuisAlejandro/pypicontents > > And some use cases: > > https://github.com/LuisAlejandro/pypicontents#use-cases > > The actual index lives here, its around 60MB: > > https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.json> > > Is updated daily with the help of Travis: > > https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml> > Anyway, I hope is useful and I'll be around for any comments or > questions.> > Cheers! > > > > Luis Alejandro Martínez Faneyth > Blog: http://huntingbears.com.ve > Github: http://github.com/LuisAlejandro > Twitter: http://twitter.com/LuisAlejandro > > CODE IS POETRY > > _ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Sat, May 20, 2017, at 05:36 PM, Steve Dower wrote: > I'll probably read the PEP closely and see that this is entirely > incorrect, but if it's right: > > * encoding for text between the build UI and build tool should just be > specified once for all platforms (i.e. use UTF-8). +1 > * encoding for text between build tool and the compiler depends on the > compiler > > In general, since most subprocesses (at least on Windows) do not have > customizable encodings, the tool that launches them needs to know what > the encoding is. Since we don't live in a Python 3.6 world quite yet, > that means the tool should read raw bytes from the compiler and encode > them to UTF-8. I half agree, but: - Build tools may not 100% know what encoding output will be produced, especially if the developer can supply a custom command for the build tool to run. - It's possible for data on a pipe to be binary data with no meaning as text. - As a lazy developer, I don't want to read stdout/stderr from a subprocess only to spit it back to my own stdout/stderr. I'd much rather just launch the subprocess and let it use the same stdout/stderr as my build tool. So I think it's most practical to recommend that build tools produce UTF-8 (if not sys.stdout.isatty()), but let build tool developers decide how far they go to comply with that. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] Finally making distlib handle spaces
Hi all, Since at least 2011, virtualenv has not supported spaces in paths. This has bitten many people, including myself, and caused numerous issues over the years [1] [2] [3] [4] [5] [6] [7]. However, as was discussed in [8], the issue lies not with virtualenv but with distlib, via pip. It would be possible for pip to use the existing distlib interface to hack around the problem, but I believe the current behavior of distlib is erroneous when it comes to spaces in paths. I therefore believe it would be more appropriate to fix the problem in distlib. Two separate patches [9] [10] that solve the problem in distlib were posted in January by Harald Nordgren. However, they were declined pending a discussion on distutils-sig [11]. As far as I can tell, no such discussion was ever started. However, the issue remains, and we have a clear solution proposal to consider, so I'd like to kick it off now. In the remainder of this email, I'll explain the problem and surrounding context in detail, and why I think the solution proposed by Harald (or some variation) is a good path forward for distlib. I look forward to hearing your thoughts on the matter. == The behavior of virtualenv == The following is written for: $ python --version Python 2.7.13 $ pip --version pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7) $ virtualenv --version 15.1.0 Creating a virtualenv is done as follows: $ virtualenv venv New python executable in /private/tmp/path with spaces/venv/bin/python2.7 Also creating executable in /private/tmp/path with spaces/venv/bin/python Installing setuptools, pip, wheel...done. This creates a directory structure looking as follows under the venv directory: ├── bin │ ├── activate │ ├── activate.csh │ ├── activate.fish │ ├── activate_this.py │ ├── easy_install │ ├── easy_install-2.7 │ ├── pip │ ├── pip2 │ ├── pip2.7 │ ├── python -> ./bin/python2.7 │ ├── python-config │ ├── python2 -> ./bin/python2.7 │ ├── python2.7 │ └── wheel ├── include │ └── ... ├── lib │ └── ... └── ... The idea is that one can call the pip and python executables inside the virtualenv, instead of the system ones. Like so: $ venv/bin/python --version Python 2.7.13 $ venv/bin/pip --version zsh: venv/bin/pip: bad interpreter: "/private/tmp/path: no such file or directory Unfortunately, as you can see, pip doesn't work at all! Why does this happen? While the python executable is a native binary, pip is actually just a Python script, which is specified to be run by the accompanying virtualenv python executable. Here are the contents: #!"/private/tmp/path with spaces/venv/bin/python2.7" # -*- coding: utf-8 -*- import re import sys from pip import main if __name__ == '__main__': sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]) sys.exit(main()) The issue is that the python binary is specified using a shebang, which is known [12] [13] to be fragile, OS-dependent, and error-prone regarding paths that are long or that contain spaces or non-ASCII characters. In particular, the quoting of the shebang does not work on most operating systems, including macOS, which is what I ran this test on. == virtualenv and distlib == The issue is complicated by the fact that there are several different libraries at play. To perform the installation of pip and wheel into the virtualenv, virtualenv calls into pip [14]. The 'pip install' command then uses the subroutine library 'wheel.py', which generates the stub scripts using distlib's ScriptMaker [15]. It is actually distlib which generates the shebang, although this can be overridden by setting the 'executable' property of the ScriptMaker object [16] [17]. Any patch to fix the virtualenv problem would therefore need to be in either pip (as a consumer of the shebang-generation interface) or distlib (as the provider of that interface). The problem cannot be addressed by virtualenv without doing something like using pip/distlib to generate the scripts and then fixing them after the fact (this has been proposed [26] [29], but I consider it a hack). == Proposed solutions == There has been extended discussion about this issue, especially in [2]. Essentially, the solutions proposed fall into four categories: (1) Don't change anything; end users can work around the issue. For example, they can place their virtualenvs in a different directory than their project, or change their username to avoid having spaces or non-ASCII characters. (2) Don't fix the bug, but add a warning to virtualenv. If we absolutely can't fix the bug (which I strongly believe not to be the case), then this would be the next best thing to do. See [27] [30] [31]. (3) Att
[Distutils] Introducing PyPIContents
Hi everyone, I'm new to this list but I've been reading some threads in the archive. Around february, an idea about indexing modules from PyPI packages was brought up. I've been working on something similar for quite a while. PyPIContents is an index of PyPI packages that lists its modules and command line scripts in JSON format, like this: [ ... "1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], "version": "1.0.0" }, ... ] You can check it out here: https://github.com/LuisAlejandro/pypicontents And some use cases: https://github.com/LuisAlejandro/pypicontents#use-cases The actual index lives here, its around 60MB: https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.json Is updated daily with the help of Travis: https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml Anyway, I hope is useful and I'll be around for any comments or questions. Cheers! Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro CODE IS POETRY ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20May2017 0820, Nick Coghlan wrote: Good point regarding the fact that the Windows 16-bit APIs only come into play for interactive sessions (even in 3.6+), while for PEP 517 we're specifically interested in the 8-bit pipes used to communicate with build subprocesses launched by an installation tool. I need to catch up on the PEP (and thanks Brett for alerting me to the thread), but this comment in particular cements the mental diagram I have right now: (build UI) <--> (build tool) <--> (compiler) ( Python ) <--> ( Python ) <--> (anything) I'll probably read the PEP closely and see that this is entirely incorrect, but if it's right: * encoding for text between the build UI and build tool should just be specified once for all platforms (i.e. use UTF-8). * encoding for text between build tool and the compiler depends on the compiler In general, since most subprocesses (at least on Windows) do not have customizable encodings, the tool that launches them needs to know what the encoding is. Since we don't live in a Python 3.6 world quite yet, that means the tool should read raw bytes from the compiler and encode them to UTF-8. The encoding between the tool and UI is essentially irrelevant - the UI is going to transform the data anyway for display, and the tool is going to have to transform it from the compilation tools, so the best we can do is pick the most likely encoding to avoid too many operations. UTF-8 is probably that. That's my 0.02AUD based on a vague memory of the PEP and this thread. As I get time today (at PyCon) to read up on it I may post amendments, but in general I'm +100 on "just pick an encoding and make the implementations transcode". Cheers, Steve ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Fri, May 19, 2017, 09:20 Thomas Kluyver, wrote: > On Fri, May 19, 2017, at 05:17 PM, Paul Moore wrote: > > On 19 May 2017 at 16:53, Daniel Holth wrote: > > > Congrats on getting 518 in. > > > > Agreed, by the way. That's a big step! > > Thanks both. It does feel like an achievement. :-) > As it should! Thanks for bringing the PEP to life! -brett ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
Good point regarding the fact that the Windows 16-bit APIs only come into play for interactive sessions (even in 3.6+), while for PEP 517 we're specifically interested in the 8-bit pipes used to communicate with build subprocesses launched by an installation tool. On 20 May 2017 at 19:11, Paul Moore wrote: > The bigger question, though, is to what extent we want to mandate that > build tools that run external tools such as compilers take > responsibility for the encoding of the output of those tools (rather > than simply passing the output through to the output stream > unmodified). And if we do want to, whether we want to allow an > exception for setuptools/distutils. > > Also, a question regarding Unix - do we really want to mandate UTF-8 > even if the system locale is set to something else? Won't that mean > that build tools have the same problem with compilers generating > output in the encoding the tool wants that we already have on Windows? Yeah, I think that problem was starting to occur to me, hence the reference to handling RPM and DEB build environments. At least for non-Windows systems, I see two possible recommendations: 1. We advise installation tools to use binary streams to communicate with build tools, and treat the results as opaque binary data. If it needs to be written out to the installation tool's own streams, then use the binary level APIs for those interfaces to inject the build tool output directly, rather than decoding and re-encoding it first. 2. We advise installation tools to adopt a PEP 538 style solution, where they mostly just trust the result of locale.getpreferredencoding() *unless* "codecs.lookup(locale.getpreferredencoding()).name == 'ascii'". In the latter case, we'd advise them to set LC_CTYPE (and potentially LANG) appropriately for the running OS. Regardless of whether or not that locale coercion was needed, we'd recommend setting "replace" or "backslashreplace" when decoding the stream output from the subprocess. At the specification level, I think option 1 probably makes the most sense - we'd be advising insallation tools that they're free to kick any mojibake problems further down the automation pipeline if they don't want to worry about it. It's also the only one of the two recommendations we can readily make cross platform. At a quality-of-implementation level, there's a lot of potential value in option 2 (at least on non-Windows systems) - we just wouldn't require or recommend it at the level of the interoperability specifications. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20 May 2017 at 09:03, Thomas Kluyver wrote: > On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote: >> * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X), >> build systems SHOULD emit UTF-8 encoded output >> * on platforms with 16-bit standard streams (e.g. Windows), build >> systems SHOULD emit UTF-16-LE encoded output > > I'm quite prepared to accept that I'm mistaken, but my understanding is > that *standard streams* are 8-bit on Windows as well. The 16-bit thing > that Python 3.6 does, as I understand it, is to bypass standard streams > when it detects that they're connected to a console, and use a Windows > API call to write text to the console directly as UTF-16. > > If so, when stdout/stderr are pipes, which I assume is how pip captures > the output from build processes, there's no particular reason to send > UTF-16 data just because it's Windows. That's my understanding too. The standard streams are still byte streams with an encoding. It's just that the underlying IO when the final destination is the console, is done by the Windows Unicode APIs. Because of this, when the output is the console the stream can accept any unicode character and so an encoding of UTF8 is specified (and yes, AIUI there is a translation Unicode string -> UTF-8 bytes -> Unicode console API). For non-console output, though, the standard streams are still byte streams and the platform behaviour is respected, so we use the ANSI codepage (calling this the platform standard glosses over the fact that there are two standard codepages, ANSI and OEM, and tools don't always make the same choice when faced with piped output). Long story short, UTF-16 is irrelevant here. The docs for 3.6 say "Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page". This is out of date (it was true for 3.5 and earlier). In 3.6+ utf-8 is used for interactive streams rather than the console codepage: >py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)" utf-8 >py -c "import sys; print(sys.stdout.encoding, file=sys.stderr)" >$null cp1252 The bigger question, though, is to what extent we want to mandate that build tools that run external tools such as compilers take responsibility for the encoding of the output of those tools (rather than simply passing the output through to the output stream unmodified). And if we do want to, whether we want to allow an exception for setuptools/distutils. Also, a question regarding Unix - do we really want to mandate UTF-8 even if the system locale is set to something else? Won't that mean that build tools have the same problem with compilers generating output in the encoding the tool wants that we already have on Windows? My feeling is: 1. Build systems SHOULD emit output encoded in the preferred locale encoding (normally UTF-8 on Unix, ANSI on Windows). 2. Build systems should ideally check the encoding used by external tools that they run and transcode to the correct encoding if necessary - but this is a quality of implementation matter. 3. Install tools MUST NOT fail if build tools produce output with the wrong encoding, but MUST correctly reproduce build tool output if the build tools do produce the right encoding. My biggest concern with this is that I believe that Visual C produces output in the OEM codepage even when output to a pipe. Actually I just did some experiments (VS 2015), and it's even worse than that. The compiler (cl) seems to use the OEM code page when writing to a pipe, but the linker uses the ANSI code page. This means that a command like "cl a£bc" produces output on (a piped) stdout that contains mixed encodings. Given this situation, I think we have to simply give up and take the view that the Visual C tools are simply broken in this regard, and we shouldn't worry about them. So I'm inclined therefore to drop point (2) from the 3 above. Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Sat, May 20, 2017, at 07:54 AM, Nick Coghlan wrote: > * on platforms with 8-bit standard streams (e.g. Linux, Mac OS X), > build systems SHOULD emit UTF-8 encoded output > * on platforms with 16-bit standard streams (e.g. Windows), build > systems SHOULD emit UTF-16-LE encoded output I'm quite prepared to accept that I'm mistaken, but my understanding is that *standard streams* are 8-bit on Windows as well. The 16-bit thing that Python 3.6 does, as I understand it, is to bypass standard streams when it detects that they're connected to a console, and use a Windows API call to write text to the console directly as UTF-16. If so, when stdout/stderr are pipes, which I assume is how pip captures the output from build processes, there's no particular reason to send UTF-16 data just because it's Windows. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 20 May 2017 at 01:16, Thomas Kluyver wrote: > On Fri, May 19, 2017, at 03:41 PM, Paul Moore wrote: >> Can we specify what encoding the informational text must be written >> in? > > Sure, that makes sense. What about: > > All hooks are run with working directory set to the root of the source > tree, and MAY print arbitrary informational text on stdout and stderr. > This text SHOULD be UTF-8 encoded, but as building may invoke other > processes, install tools MUST NOT fail if the data they receive is not > valid UTF-8; though in this case the display of the output may be > corrupted. > > Do we also want to recommend that install tools set > PYTHONIOENCODING=utf-8 when invoking build tools? Or leave this up to > the build tools? Setting PYTHONIOENCODING=utf-8:strict would potentially fail the "don't fail hard on misencoded output" requirement, and setting anything else is dubious from a potential data loss or compatibility point of view (as there's no "surrogateescape" error handler in Python 2). For use cases like distro package building, we'd also like to inherit the surrounding build environment, so explictly requiring installation tools to alter it at the Python level doesn't strike me as ideal. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig