Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 23 May 2017 at 03:38, Steve Dowerwrote: > Okay, I think I get the problem now. We expect backends to let child > subprocesses just spit out whatever *they* want onto the same stdout/stderr. > > I'm really not a fan of forcing front ends to clean up that mess, and so I'd > still suggest that the backend "tool" be a script to launch the actual tool > and do the conversion to UTF-8. One of the key premises of PEP 517 is that there will be relatively few front ends (pip, possibly easy_install, ???), but a relatively large number of backends (one per build system - at least distutils/setuptools, distutils2, flit, encons, likely eventually meson, waf, and yotta, and potentially even C/C++ build systems like autotools, CMake, etc). So it makes sense to put the implementation burden for important aspects of the UX on the part that PyPA has the most influence over (the front-end), rather than considering it reasonable for front-end developers to point fingers and say "That UX failure in the tool we provide isn't *our* fault, it's the fault of the build backend developers for not complying with the interoperability specification properly"). Once we make that core assumption about where the responsibility for the end user experience resides, then the absolutely *minimum* behavioural requirements that can be placed on build backends are: - respect the locale encoding - emit informational messages on stdout - emit error messages on stderr What we can then also do is to recommend that *front-ends* do the following when invoking their build backend CLI shims: 1. Implement the C locale -> UTF-8 based locale coercion defined in PEP 538 when launching the subprocess 2. Implement a similar coercion for Windows, where cp1252 being active in the parent process prompts a call to "'chcp cp65001'" inside the subprocess before the build backend itself actually starts running That leaves build backend authors with the freedom to assume that they *don't* need to worry about stream encoding issues, since giving them access to properly configured streams is the front end's responsibility. > Perhaps the middle ground is to specify encoding='utf-8', errors='anything > but strict' for front-ends, and well-behaved backends should do the work to > transcode when it is known to be necessary for the tools they run. (i.e. > frontends do not crash, backends have a simple rule for avoiding loss of > data). In PEP 517's architecture, the front-end developers are also responsible for the CLI that's running inside the backend subprocess. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Mon, May 22, 2017, at 11:36 PM, Steve Dower wrote: > IMHO, #2 is definitely the right way to go. Yes, the platform specific > code now has to worry about the encoding, but... the encoding is > platform specific? So... that seems exactly right? :) Maybe I'm still > missing something here, but I'm totally happy to leave it to Thomas to > decide (which I think he has, but I haven't gotten to looking at that PR > yet). I think I broadly agree with this as well. My reservation is that the build backend might be running a subprocess which produces output in an *unknown* encoding, especially if it allows the package author or the end user to configure a command to run. If it doesn't know the encoding, I'd rather get the raw bytes from the subprocess in the log (e.g. dumped to a file), rather than attempting to transcode them to UTF-8 - the conversion risks losing information, and even if it doesn't, it makes it harder to work out what was really meant. I feel like we're spending a lot of energy on a point that's not really central to the PEP, though. I think we've established that there's a potential for bugs and mojibake whatever we put in the spec. So I'd like to put something relatively simple and move on. I still stand by my PR, which amounts to "backends try to make it UTF-8, frontends don't crash if it isn't". I might be persuaded to add a recommendation that frontends dump the bytes to a file if they're not UTF-8, so the user can pull it apart if necessary. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22May2017 1253, Paul Moore wrote: It seems to me there are 2 schools of thought: 1. There are likely to be fewer front ends than back ends, and so the front end(s) (basically, pip) should deal with the problem. Also, backends are more likely to be written by developers who are looking at very specific scenarios, and asking them to handle all the complexities of robust multilingual coding is raising the bar on writing a backend too high. 2. The backend is where the problem lies, and so the backend should address the issue. Furthermore, a well-established principle in dealing with encodings is to convert to strings right at the boundary of the application, and in this case the backend is the only code that has access to that boundary. (I tend towards (2), but I honestly can't say to what extent that's because it makes it "someone else's problem" for me ;-)) I also tend towards 2, and I assume I am one of the more likely people to write the part that invokes Microsoft's cl.exe/link.exe :) Is the front end going to be directly invoking those tools? I would assume not, otherwise it won't be cross platform. Since the shim belongs to the front end, I've essentially been ignoring it. The shim can invoke another part of the build tool, but that is not going to be cl.exe/link.exe either. At some point there will be a script that runs the tools directly. I have been referring to that as the backend, and it is the part that should handle capturing and transcoding the output. Everything from there can be utf8:replace to prevent crashing, but we can't say "the frontend can handle all encodings", and shouldn't say "the frontend will only use bad encodings". IMHO, #2 is definitely the right way to go. Yes, the platform specific code now has to worry about the encoding, but... the encoding is platform specific? So... that seems exactly right? :) Maybe I'm still missing something here, but I'm totally happy to leave it to Thomas to decide (which I think he has, but I haven't gotten to looking at that PR yet). Cheers, Steve ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] setuptools 33.1.1 - 35 issue with source package install from pypi
https://stackoverflow.com/questions/44120045/something-is-breaking-my-package-deployment has full info on what i am seeing, but it suffices to say: I use setuptools in my setup.py I iterate a requirements.txt file to satisfy dependencies. I publish as sdist to pypi my package used to work fine as it currently is on pypi. however as of late i've been seeing the python setup.py invocation ( from pip install python-symphony ) fail with a couple different failure scenarios. one is it attempting to install what looks like a binary wheel ( that doesn't exist on pypi or anywhere ) that ends up being a bunch of blank stubs. the other is a situation where the package looks 100% correct in site-packages, but none of the module components are registered when you try to use the imported module. pip install --isolated --no-cache-dir python-symphony seems to avoid the issue. but it also tends to do weird stuff like break bpython by messing up the pygments install. curious if anyone was familiar with some sort of bug that i am stumbling into here. -Matt ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 18:38, Steve Dowerwrote: > Okay, I think I get the problem now. We expect backends to let child > subprocesses just spit out whatever *they* want onto the same stdout/stderr. s/expect/allow/ The paranoid in me suspects "expect" is also true, though :-) > I'm really not a fan of forcing front ends to clean up that mess, and so I'd > still suggest that the backend "tool" be a script to launch the actual tool > and do the conversion to UTF-8. What you're referring to as the backend "tool" being a script, is what the PEP refers to as a "shim" (as Nick pointed out to me) and is considered part of the front end. The back end is a set of Python APIs which are called by the front end (in any real life front end, via the front end's shim script). > Perhaps the middle ground is to specify encoding='utf-8', errors='anything > but strict' for front-ends, and well-behaved backends should do the work to > transcode when it is known to be necessary for the tools they run. (i.e. > frontends do not crash, backends have a simple rule for avoiding loss of > data). For front ends, "never crash" is essential. But "produce as readable as possible data" is also a high priority. Consider for example a Russian user with a series of directories named in Russian. If the tools write an error using his local 8-bit encoding, and the front end assumes UTF-8, then all of the high-bit characters in his directory names would be replaced. Deciphering an error message like "File ???/?/?.c: unexpected EOF" is problematic... :-( The model assumes that most front-ends would call the backend via a subprocess "shim" that was maintained by the front end project. But the expectation here seems to be that the backend is allowed to write directly to the stdio streams of its process (or at least, to let the tools it calls do so). So the shim *cannot* control the encoding of the data received by the frontend, and so the encoding has to be agreed between backend and frontend. The basic question is how the responsibility for dealing with data in an uncertain encoding is allocated. It seems to me there are 2 schools of thought: 1. There are likely to be fewer front ends than back ends, and so the front end(s) (basically, pip) should deal with the problem. Also, backends are more likely to be written by developers who are looking at very specific scenarios, and asking them to handle all the complexities of robust multilingual coding is raising the bar on writing a backend too high. 2. The backend is where the problem lies, and so the backend should address the issue. Furthermore, a well-established principle in dealing with encodings is to convert to strings right at the boundary of the application, and in this case the backend is the only code that has access to that boundary. (I tend towards (2), but I honestly can't say to what extent that's because it makes it "someone else's problem" for me ;-)) As you say, the middle ground here is that front ends must never crash, and back ends should (but aren't required to) produce output in a specified encoding (I still prefer the locale encoding as that has the best chance of avoiding the / issue). That's more or less what pip has to deal with now (and not that far off (1)), and my current attempt to address that situation is at https://github.com/pypa/pip/pull/4486 for what it's worth. A couple of final thoughts. I would expect that testing the handling of encodings is likely to be an important issue (at least, I expect there'll be bugs, and adding tests to make sure they get properly fixed will be important). Handling tool output encoding in the backend is likely to involve relatively low level interface functions, where the inputs and outputs can be relatively easily mocked. So I would expect backend unit testing of encoding handling would be relatively straightforward. Conversely, testing front end handling of encoding issues is very tricky - it's necessary to set up system state to persuade the build tools to produce the data you want to test against (it feels like integration testing rather than unit testing). Also, fixing encoding issues in the backend decouples the fix from pip's release cycle, which is probably a good thing (unless the backend is not well maintained, but that's an issue in itself). Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 15:23, Nick Coghlanwrote: > No, that's discussed here: > https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals > > Even though PEP 517 defines a Python API for build backends to > implement, it still expects installation tools to wrap a subprocess > call around the backend invocation. OK, but is it not acceptable for the child cmdline process (owned by pip) to capture the backend implementation's stdout using reassignment of sys.stdout? I assume, from your response, that it's *not* acceptable to do that - but that needs to be documented somewhere. Specifically, that the child cmdline is not allowed to do something like: out = io.StringIO sys.stdout = out build_backend.hook() print(out.getvalue(), encoding="UTF-8") (Which would otherwise be a very simple way to get guaranteed UTF-8 as the encoding across the process boundary - but it does so by imposing basically the rules I stated on the backend). > That said, the whole "The build backend still runs in a subprocess" > aspect should probably be separated out into its own section > "Isolating build backends from frontend process state", rather than > solely being covered in the "Comparison to PEP 516?" section, as it's > a key aspect of the design - we expect each installation tool to > provide its own CLI shim for calling build backends, rather than > requiring all installation tools to use the same one. Strong +1. And that section needs to be very clear on issues like this, covering what the shim is allowed to do. As the point of the shim is to protect the backend from frontend state, I'm OK with the general principle that the shim must do "as little as possible" before calling the hook - but "reset sys.stdout to protect against encoding errors" could easily be seen as within the realm of acceptable behaviour (as it stops hooks writing arbitrary Unicode to a standard output that the shim knows is limited). I'm happy enough with the idea that pip won't do anything silly in its CLI shim, but we don't want to get into the "implementation as the standard" situation where a backend is allowed to do anything that pip's shim can cope with... Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] Last call for PEP 516 champions
Hi folks, The restarted discussion on PEP 517 meant I realised that we hadn't officially decided between its Python API based approach and PEP 516's approach of using the backend CLI as the standardised interface (akin to the current setup.py approach). My current intention is to reject PEP 516's CLI standardisation approach on the following grounds: - PEP 517 makes a convincing case for the benefits of the Python API based approach within the Python ecosystem - the difficulties encountered in evolving the setup.py CLI over time lend significant weight to the notion that a Python level API will be easier to update without breaking backwards compatibility - PEP 517 still advises front-ends to isolate back-end invocation behind a subprocess boundary due to all of the other practical benefits that brings, it just makes the specifics of that invocation an implementation detail of the front-end tool - third party tools that want an implementation language independent CLI abstraction over the Python ecosystem (including builds) can just use pip itself (or another standards-compliant frontend) This isn't particularly urgent, but I also don't see a lot of reason for an extended discussion, so if I don't hear a convincing counterargument in the meantime, I'll mark the PEP as rejected at the beginning of June :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 23:15, Paul Moorewrote: > On 22 May 2017 at 12:28, Thomas Kluyver wrote: >> What if it wants to send a character which can't be encoded in the >> locale encoding? It's quite easy on Windows to end up with a character >> that you can't encode as cp1252. If the build tool uses .encode(loc_enc, >> 'replace'), then you've lost information even before it gets to the >> install tool. >> >> It's 2017, I really don't want to go down the 'locale specified >> encoding' route again. UTF-8 everywhere! > > Hang on. Can we take a step back here? I just re-read the PEP and > remembered (!) that hooks are *in-process* Python entry points (I've > been working with pip's current backend-as-subprocess model, and mixed > up in my mind the original 2 proposals here). I think this encoding > debate may be a red herring. No, that's discussed here: https://www.python.org/dev/peps/pep-0517/#comparison-to-competing-proposals Even though PEP 517 defines a Python API for build backends to implement, it still expects installation tools to wrap a subprocess call around the backend invocation. Frontends needs to do that in order to protect *their own* process state from bugs and design quirks in backend implementations: - no monkeypatching of parent process modules - no changes to the standard stream configuration - no persistent locale changes - no environment variable changes - no manipulation of any other process global state - calling sys.exit() won't cryptically crash the entire installation process - memory leaks won't cryptically crash the entire installation process - infinite loops won't *necessarily* crash the entire installation process (if the build has a timeout on it) - installation tools running with elevated privileges can readily run the build process with reduced privileges - installation tools can also readily run the build process in a chroot or containerised environment And in the context of this thread, it gives the frontend complete control over the stream output from not only the backend itself, but any child processes that it launches. That said, the whole "The build backend still runs in a subprocess" aspect should probably be separated out into its own section "Isolating build backends from frontend process state", rather than solely being covered in the "Comparison to PEP 516?" section, as it's a key aspect of the design - we expect each installation tool to provide its own CLI shim for calling build backends, rather than requiring all installation tools to use the same one. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 12:28, Thomas Kluyverwrote: > What if it wants to send a character which can't be encoded in the > locale encoding? It's quite easy on Windows to end up with a character > that you can't encode as cp1252. If the build tool uses .encode(loc_enc, > 'replace'), then you've lost information even before it gets to the > install tool. > > It's 2017, I really don't want to go down the 'locale specified > encoding' route again. UTF-8 everywhere! Hang on. Can we take a step back here? I just re-read the PEP and remembered (!) that hooks are *in-process* Python entry points (I've been working with pip's current backend-as-subprocess model, and mixed up in my mind the original 2 proposals here). I think this encoding debate may be a red herring. If a hook is being called as a Python method call, then it can print what it likes to stdout and stderr. And it's the backend's responsibility to ensure that it never fails when printing - so the *backend* has to deal with the fact that anything it wants to print must be representable in sys.stdout.encoding, with the default (raise an exception) error handling. Given this fact, and the fact that sys.stdout and sys.stderr are *text* output streams, build frontends like pip can reasonably just replace sys.std{out,err} (for example with a StringIO object) to get hook output. There's no encoding issue for frontends, they just capture the text sent to the stdio streams. The rules needed for *backends* are then: 1. Backends MUST NOT write to raw IO channels, all output MUST go via sys.stdout and sys.stderr. Build frontends MAY redirect these streams to post-process them, but are not required to do so. As a consequence: 1a. Backends MUST be prepared to deal with the possibility that those IO streams have the limitations of the platform IO streams (e.g., limited subset of Unicode allowed, fails with an exception when invalid characters are written). 1b. Backends MUST capture and manage the output from any subprocesses they spawn (so that they can follow the other rules). 1c. Backends cannot assume that they can write output that the user will see - frontends may suppress or modify any output passed on stdout. Conversely, backends should not bypass the ability of frontends to capture stdout, as frontends are responsible for user interaction. Some of those MUSTs could be replaced by SHOULD, if we want to allow backends to write directly to the screen. But that is likely to corrupt the UI of the frontend, so I'm inclined to say that we don't allow that. Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 21:28, Thomas Kluyverwrote: > On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote: >> The only reservation I have is that the choice of UTF-8 means that on >> Windows, build backends pretty much have to explicitly manage tool >> output (as they are pretty much certain *not* to output in UTF-8). >> Build backend writers that aren't aware of this issue (most likely >> because their main platform is not Windows) could very easily choose >> to just pass through the raw bytes, and as a result *all* non-ASCII >> output would be garbled on non-UTF-8 systems. >> >> Would locale.getpreferredencoding() not be a better choice here? I >> know it has issues in some situations on Unix, but are they worse than >> the issues UTF-8 would cause on Windows? After all it's the encoding >> used by subprocess.Popen in "universal newlines" mode... > > What if it wants to send a character which can't be encoded in the > locale encoding? It's quite easy on Windows to end up with a character > that you can't encode as cp1252. If the build tool uses .encode(loc_enc, > 'replace'), then you've lost information even before it gets to the > install tool. The counterargument is that there's plenty of text that *can* be correctly encoded in cp1252 (especially in Europe and LATAM) that will be rendered incorrectly if the installation tool attempts to interpret it as UTF-8. CPython itself will also display explicitly UTF-8 encoded text incorrectly on a Windows console in versions prior to 3.6. > It's 2017, I really don't want to go down the 'locale specified > encoding' route again. UTF-8 everywhere! "UTF-8 everywhere" is fine for network services that only need to talk to other network services, command line applications, and web browsers, but even in 2017 it's still a problematic assumption on client devices running Windows or Linux. Rather than the locale specified encoding being broken in general, the key recurring problem we've found with it on *nix systems relates to the fact that glibc still defaults to ASCII in the C locale - "assume ASCII really means UTF-8" is enough to solve that problem *without* breaking compatibility with cp1252 and non-UTF-8 universal encodings. The other recurring problem is cp1252 itself on Windows, which suffers from the fact that there isn't a nice environment variable based way to change the active code page when invoking a subprocess, and also that cp65001 (the UTF-8 code page) isn't really properly supported in Python 2.7 (although you can inject a custom search function to alias it to utf-8 [1]). Even in that case though, mandating "though shalt treat the streams as UTF-8" in the spec doesn't *solve* those problems - it just means we're specifying a behaviour that we know will provide a poor developer experience on Windows, rather than alerting tool developers to the fact that this is something they're going to need to be aware of. Cheers, Nick. [1] http://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 21:02, Paul Moorewrote: > On 22 May 2017 at 11:22, Thomas Kluyver wrote: >> I have made a PR against the PEP with my best take on the encoding >> situation: >> https://github.com/python/peps/pull/264/files > > LGTM. > > The only reservation I have is that the choice of UTF-8 means that on > Windows, build backends pretty much have to explicitly manage tool > output (as they are pretty much certain *not* to output in UTF-8). > Build backend writers that aren't aware of this issue (most likely > because their main platform is not Windows) could very easily choose > to just pass through the raw bytes, and as a result *all* non-ASCII > output would be garbled on non-UTF-8 systems. > > Would locale.getpreferredencoding() not be a better choice here? I > know it has issues in some situations on Unix, but are they worse than > the issues UTF-8 would cause on Windows? After all it's the encoding > used by subprocess.Popen in "universal newlines" mode... +1 from me for locale.getpreferredencoding() as the default - not only is it a more suitable default on Windows, it's also the best way to do the right thing in GB.18030 locales, and as far as I'm aware, handling that correctly is still a requirement for selling commercial software into China (that's why I chose it as the main non-UTF-8 example encoding in PEP 538). If Python tools want to specifically detect the use of 7-bit ASCII and override *that* to be UTF-8, then the relevant snippet is: def get_stream_encoding(): nominal = locale.getpreferredencoding() if codecs.lookup(nominal).name == "ascii": return "utf-8" return nominal That's effectively the same model that PEP 538 and 540 are proposing be applied by default for the standard streams, so it would also interoperate well with Python 3.7+. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote: > The only reservation I have is that the choice of UTF-8 means that on > Windows, build backends pretty much have to explicitly manage tool > output (as they are pretty much certain *not* to output in UTF-8). > Build backend writers that aren't aware of this issue (most likely > because their main platform is not Windows) could very easily choose > to just pass through the raw bytes, and as a result *all* non-ASCII > output would be garbled on non-UTF-8 systems. > > Would locale.getpreferredencoding() not be a better choice here? I > know it has issues in some situations on Unix, but are they worse than > the issues UTF-8 would cause on Windows? After all it's the encoding > used by subprocess.Popen in "universal newlines" mode... What if it wants to send a character which can't be encoded in the locale encoding? It's quite easy on Windows to end up with a character that you can't encode as cp1252. If the build tool uses .encode(loc_enc, 'replace'), then you've lost information even before it gets to the install tool. It's 2017, I really don't want to go down the 'locale specified encoding' route again. UTF-8 everywhere! One affordance I'd consider is a recommendation to install tools that if captured output is not valid UTF-8, they dump the raw bytes to a file so that no information is lost. I'm not sure if that recommendation needs to be in the spec itself, though. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 11:22, Thomas Kluyverwrote: > I have made a PR against the PEP with my best take on the encoding > situation: > https://github.com/python/peps/pull/264/files LGTM. The only reservation I have is that the choice of UTF-8 means that on Windows, build backends pretty much have to explicitly manage tool output (as they are pretty much certain *not* to output in UTF-8). Build backend writers that aren't aware of this issue (most likely because their main platform is not Windows) could very easily choose to just pass through the raw bytes, and as a result *all* non-ASCII output would be garbled on non-UTF-8 systems. Would locale.getpreferredencoding() not be a better choice here? I know it has issues in some situations on Unix, but are they worse than the issues UTF-8 would cause on Windows? After all it's the encoding used by subprocess.Popen in "universal newlines" mode... Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
I have made a PR against the PEP with my best take on the encoding situation: https://github.com/python/peps/pull/264/files On Mon, May 22, 2017, at 11:19 AM, Paul Moore wrote: > On 22 May 2017 at 10:56, Thomas Kluyverwrote: > > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote: > >> Require that build tools either send UTF-8 to the UI component, or write > >> bytes to a file and call it a build output. I see no benefit in > >> requiring both the build tool and the UI tool to guess what the text > >> encoding is. > > > > I'm not proposing that the install tool should try to guess the > > encoding, but I think a well written install tool shouldn't crash if the > > build output doesn't match the encoding it expects. Even if the spec > > says that the build output MUST be UTF-8 encoded, build tools can have > > bugs, and you don't want want the install to fail just because the log > > isn't correctly encoded. > > > > Hence, I think a 'SHOULD' is appropriate for this part of the spec: > > > > - To install tool authors, it is clear that they can display the output > > as UTF-8 so long as they don't crash if it's invalid. > > - To build tool authors, it's clear that they can't pass the buck to > > install tool authors if output gets jumbled because it's not UTF-8. > > I'd say that it's not so much just "well written" install tools. I'd > say that install tools MUST NOT crash if build tool output isn't in > the expected encoding. On the other hand, the encoding agreement > implies that if build tools *do* send data in the correct encoding > then they are entitled to expect that it will be displayed accurately > to the end user. > > Output can be garbled in two ways: > > 1. The build tool does not (or cannot) ensure that its output is in > the standard-mandated encoding. > 2. The install tool cannot display the full range of characters > representable in the standard-mandated encoding. > > Neither of these should cause a failure. Well written install tools > should warn in the case of (1) - "I have been passed data that I don't > understand, I'll do my best to display it but can't guarantee the > output won't be garbled". In the case of (2), though, that's "as > expected" - if your OS settings mean you can't display certain > characters, you shouldn't be surprised if your install tool replaces > them with a placeholder. > > On an implementation note, this boils down to something like the > following in the install tool: > > # Step 1 > try: > data = decode build output using STD_ENCODING > except UnicodeDecodeError: > warn "Data is not in expected encoding" > data = decode using STD_ENCODING with errors= replacement> > > # Step 2 > data = data.encode(MY_OUTPUT_ENCODING, errors= replacement>).decode(MY_OUTPUT_ENCODING) > > # We now have subprocess output that's safe to display if requested. > > As a side note, I find step 2 "sanitise my string to ensure it can be > safely output" to be a pretty common operation - possibly because > Python's standard IO streams raise exceptions on unicode errors - and > I'm surprised there isn't a better way to spell it than the > encode/decode pair above. > > Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On 22 May 2017 at 10:56, Thomas Kluyverwrote: > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote: >> Require that build tools either send UTF-8 to the UI component, or write >> bytes to a file and call it a build output. I see no benefit in >> requiring both the build tool and the UI tool to guess what the text >> encoding is. > > I'm not proposing that the install tool should try to guess the > encoding, but I think a well written install tool shouldn't crash if the > build output doesn't match the encoding it expects. Even if the spec > says that the build output MUST be UTF-8 encoded, build tools can have > bugs, and you don't want want the install to fail just because the log > isn't correctly encoded. > > Hence, I think a 'SHOULD' is appropriate for this part of the spec: > > - To install tool authors, it is clear that they can display the output > as UTF-8 so long as they don't crash if it's invalid. > - To build tool authors, it's clear that they can't pass the buck to > install tool authors if output gets jumbled because it's not UTF-8. I'd say that it's not so much just "well written" install tools. I'd say that install tools MUST NOT crash if build tool output isn't in the expected encoding. On the other hand, the encoding agreement implies that if build tools *do* send data in the correct encoding then they are entitled to expect that it will be displayed accurately to the end user. Output can be garbled in two ways: 1. The build tool does not (or cannot) ensure that its output is in the standard-mandated encoding. 2. The install tool cannot display the full range of characters representable in the standard-mandated encoding. Neither of these should cause a failure. Well written install tools should warn in the case of (1) - "I have been passed data that I don't understand, I'll do my best to display it but can't guarantee the output won't be garbled". In the case of (2), though, that's "as expected" - if your OS settings mean you can't display certain characters, you shouldn't be surprised if your install tool replaces them with a placeholder. On an implementation note, this boils down to something like the following in the install tool: # Step 1 try: data = decode build output using STD_ENCODING except UnicodeDecodeError: warn "Data is not in expected encoding" data = decode using STD_ENCODING with errors= # Step 2 data = data.encode(MY_OUTPUT_ENCODING, errors=).decode(MY_OUTPUT_ENCODING) # We now have subprocess output that's safe to display if requested. As a side note, I find step 2 "sanitise my string to ensure it can be safely output" to be a pretty common operation - possibly because Python's standard IO streams raise exceptions on unicode errors - and I'm surprised there isn't a better way to spell it than the encode/decode pair above. Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 517 - specifying build system in pyproject.toml
On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote: > Require that build tools either send UTF-8 to the UI component, or write > bytes to a file and call it a build output. I see no benefit in > requiring both the build tool and the UI tool to guess what the text > encoding is. I'm not proposing that the install tool should try to guess the encoding, but I think a well written install tool shouldn't crash if the build output doesn't match the encoding it expects. Even if the spec says that the build output MUST be UTF-8 encoded, build tools can have bugs, and you don't want want the install to fail just because the log isn't correctly encoded. Hence, I think a 'SHOULD' is appropriate for this part of the spec: - To install tool authors, it is clear that they can display the output as UTF-8 so long as they don't crash if it's invalid. - To build tool authors, it's clear that they can't pass the buck to install tool authors if output gets jumbled because it's not UTF-8. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig