Re: [Python-Dev] Python 2.7 patch levels turning two digit
On 24 Jun 2014 07:29, "Donald Stufft" wrote: > > > On Jun 23, 2014, at 5:22 PM, Barry Warsaw wrote: > > > On Jun 23, 2014, at 05:15 PM, Donald Stufft wrote: > > > >> Normally when I see someone suggest that switching compilers > >> in 2.7.x is likely to be less work than releasing a 2.8 It normally > >> appears to me they haven’t looked at the impact on the packaging > >> tooling. > > > > Just to be clear, releasing a Python 2.8 has enormous impact outside of just > > the amount of work to do it. It's an exceedingly bad idea. > > Can you clarify? > > Also FWIW I’m not really married to the 2.8 thing, it’s mostly that, on Windows, the X.Y release > prior to the ABI thing in 3.x _was_ the ABI so all the tooling builds on that. So you need to > either > > 1) Stick with the old Compiler This is what we're going with. Steve is working on making that more manageable from the Visual Studio side, and there are some folks in the numeric/scientific community looking at improving the usability of the MinGW toolchain for the purpose of building Python 2.7 C extensions. > 2) Release 2.8 Impractical for the various reasons Barry listed. > 3) Do all the work to fix all the tooling to cope with the fact that X.Y isn’t the ABI on 2.x anymore Impractical for the various reasons you listed. > I don’t think a reasonable option is: > > 4) Just switch compilers and leave it on someone else’s doorsteps to fix the entire packaging > tool chain to cope. Agreed. We discussed this option in detail when the Stackless folks asked about it a while ago, and the conclusion was that the risk of obscure breakage was just too high. Cheers, Nick. > > - > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
On 25 Jun 2014 07:05, "Ethan Furman" wrote: > > On 06/24/2014 12:54 PM, Ned Deily wrote: >> >> >> Yes, we are committed to maintaining >> Python 2.7 for multiple years but that doesn't mean we have to fix every >> open issue or even most open issues. Any or all of the above costs may >> apply to any changes we make. For many of our users, the best >> maintenance policy for Python 2.7 would be the least change possible. > > > +1 > > We need to keep 2.7 running, but we don't need to kill ourselves doing it. If a bug has been there for a while, the affected users are probably working around it by now. ;) Aye, in this case, I'm in the "officially deprecate the feature" camp. Don't actively try to break it further, just slap a warning in the docs to say it is no longer a supported configuration. In my own personal case, I not only wasn't aware that there was still an option to turn off the Unicode support, but I also wouldn't really class a build with it turned off as still being Python. As Jim noted, there are quite a lot of APIs that don't make sense if there's no Unicode type available. Cheers, Nick. > > -- > ~Ethan~ > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
On 26 Jun 2014 01:13, "Serhiy Storchaka" wrote: > > 25.06.14 16:29, Victor Stinner написав(ла): >> >> 2014-06-25 14:58 GMT+02:00 Serhiy Storchaka : >>> >>> Other benefit: patches exposed several bugs in code (mainly errors in >>> backporting from 3.x). >> >> >> Oh, interesting. Do you have examples of such bugs? > > > In posixpath branches for unicode and str should be reversed. > In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember. OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Regards, Nick. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Binary CPython distribution for Linux
On 27 Jun 2014 17:33, "Bohuslav Kabrda" wrote: > > It's not true that 2.7 wasn't released until few weeks ago. It was released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, these collections now have their community rebuilds at [2], so you can just download them without needing to talk to Red Hat at all. But yeah, these are all RPMs, so you have to be root to install them. Indeed, while there are still some rough edges, software collections look like the best approach to doing maintainable system installs of Python runtimes other than the system Python into Fedora/RHEL/CentOS et al (and I say that while wearing both my upstream and downstream hats). Collections solve this problem in a general (rather than CPython specific) way, since they can be used to get upgraded versions of language runtimes, databases, web servers, etc, all without risking the stability of the OS itself. I hope to see someone put together collections for PyPy and PyPy3 as well. The approaches used for runtime isolation of software collections should also be applicable to Debian systems, but (as far as I am aware) the tooling to build them as debs rather than RPMs doesn't exist yet. > Please don't take this as a criticism of your ideas, I see what you're trying to solve. I just think the way you're trying to solve it is unachievable or would consume so much community resources, that it would end up unmaintained and buggy most of the time. For prebuilt userland installs on Linux, I think "miniconda" is the current best available approach. It has its challenges (especially around its handling of security concerns), but it's designed to offer a full cross platform package management system that makes it well suited to the task of managing prebuilt language runtimes in user space. Cheers, Nick. > > -- > Regards, > Bohuslav "Slavek" Kabrda. > > [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/ > [2] https://www.softwarecollections.org/en/scls/ > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 28 Jun 2014 01:27, "Jonas Wielicki" wrote: > > On 27.06.2014 00:59, Ben Hoyt wrote: > > Specifics of proposal > > = > > [snip] Each ``DirEntry`` object has the following > > attributes and methods: > > [snip] > > Notes on caching > > > > > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > > is obviously always cached, and the ``is_X`` and ``lstat`` methods > > cache their values (immediately on Windows via ``FindNextFile``, and > > on first use on Linux / OS X via a ``stat`` call) and never refetch > > from the system. > > I find this behaviour a bit misleading: using methods and have them > return cached results. How much (implementation and/or performance > and/or memory) overhead would incur by using property-like access here? > I think this would underline the static nature of the data. > > This would break the semantics with respect to pathlib, but they’re only > marginally equal anyways -- and as far as I understand it, pathlib won’t > cache, so I think this has a fair point here. Indeed - using properties rather than methods may help emphasise the deliberate *difference* from pathlib in this case (i.e. value when the result was retrieved from the OS, rather than the value right now). The main benefit is that switching from using the DirEntry object to a pathlib Path will require touching all the places where the performance characteristics switch from "memory access" to "system call". This benefit is also the main downside, so I'd actually be OK with either decision on this one. Other comments: * +1 on the general idea * +1 on scandir() over iterdir, since it *isn't* just an iterator version of listdir * -1 on including Windows specific globbing support in the API * -0 on including cross platform globbing support in the initial iteration of the API (that could be done later as a separate RFE instead) * +1 on a new section in the PEP covering rejected design options (calling it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) * regarding "why not a 2-tuple", we know from experience that operating systems evolve and we end up wanting to add additional info to this kind of API. A dedicated DirEntry type lets us adjust the information returned over time, without breaking backwards compatibility and without resorting to ugly hacks like those in some of the time and stat APIs (or even our own codec info APIs) * it would be nice to see some relative performance numbers for NFS and CIFS network shares - the additional network round trips can make excessive stat calls absolutely brutal from a speed perspective when using a network drive (that's why the stat caching added to the import system in 3.3 dramatically sped up the case of having network drives on sys.path, and why I thought AJ had a point when he was complaining about the fact we didn't expose the dirent data from os.listdir) Regards, Nick. > > regards, > jwi > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 28 June 2014 19:17, Nick Coghlan wrote: > Agreed, but walking even a moderately large tree over the network can > really hammer home the point that this offers a significant > performance enhancement as the latency of access increases. I've found > that kind of comparison can be eye-opening for folks that are used to > only operating on local disks (even spinning disks, let alone SSDs) > and/or relatively small trees (distro build trees aren't *that* big, > but they're big enough for this kind of difference in access overhead > to start getting annoying). Oops, forgot to add - I agree this isn't a blocking issue for the PEP, it's definitely only in "nice to have" territory. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 28 June 2014 16:17, Gregory P. Smith wrote: > On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan wrote: >> * it would be nice to see some relative performance numbers for NFS and >> CIFS network shares - the additional network round trips can make excessive >> stat calls absolutely brutal from a speed perspective when using a network >> drive (that's why the stat caching added to the import system in 3.3 >> dramatically sped up the case of having network drives on sys.path, and why >> I thought AJ had a point when he was complaining about the fact we didn't >> expose the dirent data from os.listdir) > > fwiw, I wouldn't wait for benchmark numbers. > > A needless stat call when you've got the information from an earlier API > call is already brutal. It is easy to compute from existing ballparks remote > file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. > fetch of stat info cached in memory on file server on the local network: > ~500us. You can go down further to local system call overhead which can > vary wildly but should likely be assumed to be at least 10us. > > You don't need a benchmark to tell you that adding needless >= 500us-100ms > blocking operations to your program is bad. :) Agreed, but walking even a moderately large tree over the network can really hammer home the point that this offers a significant performance enhancement as the latency of access increases. I've found that kind of comparison can be eye-opening for folks that are used to only operating on local disks (even spinning disks, let alone SSDs) and/or relatively small trees (distro build trees aren't *that* big, but they're big enough for this kind of difference in access overhead to start getting annoying). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 05:48, Ben Hoyt wrote: >>> But the underlying system calls -- ``FindFirstFile`` / >>> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- >> >> What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide >> readdir? > > I guess it'd be better to say "Windows" and "Unix-based OSs" > throughout the PEP? Because all of these (including Mac OS X) are > Unix-based. *nix and POSIX-based are the two conventions I use. >> Crazy idea: would it be possible to "convert" a DirEntry object to a >> pathlib.Path object without losing the cache? I guess that >> pathlib.Path expects a full stat_result object. > > The main problem is that pathlib.Path objects explicitly don't cache > stat info (and Guido doesn't want them to, for good reason I think). > There's a thread on python-dev about this earlier. I'll add it to a > "Rejected ideas" section. The key problem with caches on pathlib.Path objects is that you could end up with two separate path objects that referred to the same filesystem location but returned different answers about the filesystem state because their caches might be stale. DirEntry is different, as the content is generally *assumed* to be stale (referring to when the directory was scanned, rather than the current filesystem state). DirEntry.lstat() on POSIX systems will be an exception to that general rule (referring to the time of first lookup, rather than when the directory was scanned, so the answer rom lstat() may be inconsistent with other data stored directly on the DirEntry object), but one we can probably live with. More generally, as part of the pathlib PEP review, we figured out that a *per-object* cache of filesystem state would be an inherently bad idea, but a string based *process global* cache might make sense for modules like walkdir (not part of the stdlib - it's an iterator pipeline based approach to file tree scanning I wrote a while back, that currently suffers badly from the performance impact of repeated stat calls at different stages of the pipeline). We realised this was getting into a space where application and library specific concerns are likely to start affecting the caching design, though, so the current status of standard library level stat caching is "it's not clear if there's an available approach that would be sufficiently general purpose to be appropriate for inclusion in the standard library". Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 05:55, Ben Hoyt wrote: > Re is_dir etc being properties rather than methods: > >>> I find this behaviour a bit misleading: using methods and have them >>> return cached results. How much (implementation and/or performance >>> and/or memory) overhead would incur by using property-like access here? >>> I think this would underline the static nature of the data. >>> >>> This would break the semantics with respect to pathlib, but they're only >>> marginally equal anyways -- and as far as I understand it, pathlib won't >>> cache, so I think this has a fair point here. >> >> Indeed - using properties rather than methods may help emphasise the >> deliberate *difference* from pathlib in this case (i.e. value when the >> result was retrieved from the OS, rather than the value right now). The main >> benefit is that switching from using the DirEntry object to a pathlib Path >> will require touching all the places where the performance characteristics >> switch from "memory access" to "system call". This benefit is also the main >> downside, so I'd actually be OK with either decision on this one. > > The problem with this is that properties "look free", they look just > like attribute access, so you wouldn't normally handle exceptions when > accessing them. But .lstat() and .is_dir() etc may do an OS call, so > if you're needing to be careful with error handling, you may want to > handle errors on them. Hence I think it's best practice to make them > functions(). > > Some of us discussed this on python-dev or python-ideas a while back, > and I think there was general agreement with what I've stated above > and therefore they should be methods. But I'll dig up the links and > add to a Rejected ideas section. Yes, only the stuff that *never* needs a system call (regardless of OS) would be a candidate for handling as a property rather than a method call. Consistency of access would likely trump that idea anyway, but it would still be worth ensuring that the PEP is clear on which values are guaranteed to reflect the state at the time of the directory scanning and which may imply an additional stat call. >> * it would be nice to see some relative performance numbers for NFS and CIFS >> network shares - the additional network round trips can make excessive stat >> calls absolutely brutal from a speed perspective when using a network drive >> (that's why the stat caching added to the import system in 3.3 dramatically >> sped up the case of having network drives on sys.path, and why I thought AJ >> had a point when he was complaining about the fact we didn't expose the >> dirent data from os.listdir) > > Don't know if you saw, but there are actually some benchmarks, > including one over NFS, on the scandir GitHub page: > > https://github.com/benhoyt/scandir#benchmarks No, I hadn't seen those - may be worth referencing explicitly from the PEP (and if there's already a reference... oops!) > os.walk() was 23 times faster with scandir() than the current > listdir() + stat() implementation on the Windows NFS file system I > tried. Pretty good speedup! Ah, nice! Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 20:52, Steven D'Aprano wrote: > Speaking of caching, is there a way to freshen the cached values? Switch to a full Path object instead of relying on the cached DirEntry data. This is what makes me wary of including lstat, even though Windows offers it without the extra stat call. Caching behaviour is *really* hard to make intuitive, especially when it *sometimes* returns data that looks fresh (as it on first call on POSIX systems). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 21:45, Paul Moore wrote: > On 29 June 2014 12:08, Nick Coghlan wrote: >> This is what makes me wary of including lstat, even though Windows >> offers it without the extra stat call. Caching behaviour is *really* >> hard to make intuitive, especially when it *sometimes* returns data >> that looks fresh (as it on first call on POSIX systems). > > If it matters that much we *could* simply call it cached_lstat(). It's > ugly, but I really don't like the idea of throwing the information > away - after all, the fact that we currently throw data away is why > there's even a need for scandir. Let's not make the same mistake > again... Future-proofing is the reason DirEntry is a full fledged class in the first place, though. Effectively communicating the behavioural difference between DirEntry and pathlib.Path is the main thing that makes me nervous about adhering too closely to the Path API. To restate the problem and the alternative proposal, these are the DirEntry methods under discussion: is_dir(): like os.path.isdir(), but requires no system calls on at least POSIX and Windows is_file(): like os.path.isfile(), but requires no system calls on at least POSIX and Windows is_symlink(): like os.path.islink(), but requires no system calls on at least POSIX and Windows lstat(): like os.lstat(), but requires no system calls on Windows For the almost-certain-to-be-cached items, the suggestion is to make them properties (or just ordinary attributes): is_dir is_file is_symlink What do with lstat() is currently less clear, since POSIX directory scanning doesn't provide that level of detail by default. The PEP also doesn't currently state whether the is_dir(), is_file() and is_symlink() results would be updated if a call to lstat() produced different answers than the original directory scanning process, which further suggests to me that allowing the stat call to be delayed on POSIX systems is a potentially problematic and inherently confusing design. We would have two options: - update them, meaning calling lstat() may change those results from being a snapshot of the setting at the time the directory was scanned - leave them alone, meaning the DirEntry object and the DirEntry.lstat() result may give different answers Those both sound ugly to me. So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes. That would make the DirEntry attributes: is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False (I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name) What this would allow: - by default, scanning is efficient everywhere, but lstat_result may be None on POSIX systems - if you always need the lstat result, setting "ensure_lstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None Most importantly, *regardless of platform*, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object. There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object. Regards, Nick. P.S. We'd be generating quite a few of these, so we can use __slots__ to keep the memory overhead to a minimum (that's just a general comment - it's really irrelevant to the methods-or-attributes question). -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 30 Jun 2014 19:13, "Glenn Linderman" wrote: > > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. I'm -1 on a refresh API for DirEntry - just use pathlib in that case. Cheers, Nick. > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 Jul 2014 07:31, "Victor Stinner" wrote: > > 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > > 2) Nick Coghlan's proposal on the previous thread > > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > > suggesting an ensure_lstat keyword param to scandir if you need the > > lstat_result value > > I don't like this idea because it makes error handling more complex. > The syntax to catch exceptions on an iterator is verbose (while: try: > next() except ...). Actually, we may need to copy the os.walk API and accept an "onerror" callback as a scandir argument. Regardless of whether or not we have "ensure_lstat", the iteration step could fail, so I don't believe we can just transfer the existing approach of catching exceptions from the listdir call. > Whereas calling os.lstat(entry.fullname()) is explicit and it's easy > to surround it with try/except. > > > > .lstat_result being None sometimes (on POSIX), > > Don't do that, it's not how Python handles portability. We use hasattr(). That's not true in general - we do either, depending on context. With the addition of an os.walk style onerror callback, I'm still in favour of a "get_lstat" flag (tweaked as Ben suggests to always be None unless requested, so Windows code is less likely to be inadvertently non-portable) > > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? > > Yes, it can happen. The filesystem is system-wide and shared by all > users. The file can be deleted. We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 08:42, Ben Hoyt wrote: >> We need per-iteration error handling for the readdir call anyway, so I think >> an onerror callback is a better option than dropping the ability to easily >> obtain full stat information as part of the iteration. > > I don't mind the idea of an "onerror" callback, but it's adding > complexity. Putting aside the question of caching/timing for a second > and assuming .lstat() as per the current PEP 471, do we really need > per-iteration error handling for readdir()? When would that actually > fail in practice? An NFS mount dropping the connection or a USB key being removed are the first that come to mind, but I expect there are others. I find it's generally better to just assume that any system call may fail for obscure reasons and put the infrastructure in place to deal with it rather than getting ugly, hard to track down bugs later. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
On 1 Jul 2014 11:28, "Alex Gaynor" wrote: > > I've come up with a new approach, which I believe is most likely to be > successful, but I'll need help to implement it. > > The idea is to find the most recent commit which is a parent of both the > ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` > related file on the ``default`` branch, and attempt to replay it on the ``2.7`` > branch. Require manual review on each commit to make sure it compiles, and to > ensure it doesn't make any backwards incompatible changes. > > I think this provides the most iterative and guided approach to getting this > done. Sounds promising, although it may still have some challenges if the SSL code depends on earlier changes to other code. > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? For the Mercurial part, it's probably worth posing that as a Stack Overflow question: Given two named branches in http://hg.python.org (default and 2.7) and 4 files (Python module, C module, tests, docs): - find the common ancestor - find all the commits affecting those files on default & graft them to 2.7 (with a chance to test and edit each one first) It's just a better environment for asking & answering that kind of question :) Cheers, Nick. > > Cheers, > Alex > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 14:20, Paul Moore wrote: > On 1 July 2014 14:00, Ben Hoyt wrote: >> 2) Nick Coghlan's proposal on the previous thread >> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) >> suggesting an ensure_lstat keyword param to scandir if you need the >> lstat_result value >> >> I would make one small tweak to Nick Coghlan's proposal to make >> writing cross-platform code easier. Instead of .lstat_result being >> None sometimes (on POSIX), have it None always unless you specify >> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make >> this more obvious.) Per (b) above, this means Windows developers >> wouldn't accidentally write code which failed on POSIX systems -- it'd >> fail fast on Windows too if you accessed .lstat_result without >> specifying get_lstat=True. > > This is getting very complicated (at least to me, as a Windows user, > where the basic idea seems straightforward). > > It seems to me that the right model is the standard "thin wrapper > round the OS feature" that acts as a building block - it's typical of > the rest of the os module. I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). > Typically, such thin wrappers expose POSIX semantics by default, and > Windows behaviour follows as closely as possible (see for example > stat, where st_ino makes no sense on Windows, but is present). In this > case, we're exposing Windows semantics, and POSIX is the one needing > to fit the model, but the principle is the same. > > On that basis, optional attributes (as used in stat results) seem > entirely sensible. > > The documentation for DirEntry could easily be written to parallel > that of a stat result: > > """ > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) > """ > > That's no harder to understand (or to work with) than the equivalent > stat result. The only difference is that the unavailable attributes > can be queried on POSIX, there's just a separate system call involved > (with implications in terms of performance, error handling and > potential race conditions). > > The version of scandir with the ensure_lstat argument is easy to write > based on one with optional arguments (I'm playing fast and loose with > adding attributes to DirEntry values here, just for the sake of an > example - the details are left as an exercise) > > def scandir_ensure(path='.', ensure_lstat=False): > for entry in os.scandir(path): > if ensure_lstat and not hasattr(entry, 'st_size'): > stat_data = os.lstat(entry.full_name) > entry.st_size = stat_data.st_size > entry.st_atime = stat_data.st_atime > entry.st_mtime = stat_data.st_mtime > entry.st_ctime = stat_data.st_ctime > # Ignore file_attributes, as we'll never get here on Windows > yield entry > > Variations on how you handle errors in the lstat call, etc, can be > added to taste. > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 from me - especially if this recipe goes in at least the PEP, and potentially even the docs. I'm also OK with postponing onerror support for the time being - that should be straightforward to add later if we decide we need it. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot.python.org down again?
On 7 Jul 2014 10:47, "Guido van Rossum" wrote: > > It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA. I volunteered to be the board's liaison to the infrastructure team, and getting more visibility around what the infrastructure *is* and how it's monitored and supported is going to be part of that. That will serve a couple of key purposes: - making the points of escalation clearer if anything breaks or needs improvement (although "infrastruct...@python.org" is a good default choice) - making the current "todo" list of the infrastructure team more visible (both to calibrate resolution time expectations and to provide potential contributors an idea of what's involved) Noah has already set up http://status.python.org/ to track service status, I can see about getting buildbot.python.org added to the list. Cheers, Nick. > > > On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy wrote: >> >> On 7/6/2014 7:54 PM, Ned Deily wrote: >>> >>> As of the moment, buildbot.python.org seems to be down again. >> >> >> Several hours later, back up. >> >> >> > Where is the best place to report problems like this? >> >> We should have, if not already, an automatic system to detect down servers and report (email) to appropriate persons. >> >> -- >> Terry Jan Reedy >> >> >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - summary
On 7 Jul 2014 19:22, "Andreas Maier" wrote: > > Thanks to all who responded. > > In absence of class-specific equality test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in both Python 2 and Python 3. > > In absence of specific ordering test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in Python 2. In Python 3, an exception is raised in that case. In Python 2, it orders by type, and only then by id (which happens to be the address in CPython). > > The bottom line of the discussion seems to be that this behavior is intentional, and a lot of code depends on it. > > We still need to figure out how to document this. Options could be: > > 1. We define that the default for the value of an object is its identity. That allows to describe the behavior of the equality test without special casing such objects, but it does not work for ordering. Also, I have difficulties stating what constitutes that default case, because it can really only be explained by referring to the presence or absence of the class-specific equality test and ordering test methods. > > 2. We don't say anything about the default value of an object, and describe the behavior of the equality test and ordering test, which both need to cover the case that the object does not have the respective test methods. The behaviour of Python 3's type system is fully covered by equality defaulting to comparing by identity, and ordering comparisons having to be defined explicitly. The docs at https://docs.python.org/3/reference/expressions.html#not-in could likely be clarified, but they do cover this (they just cover a lot about the builtins at the same time). > It seems to me that only option 2 really works. Indeed, and that's the version already documented. Regards, Nick. > > > Comments and further options welcome. > > Andy > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 Jul 2014 17:14, "Ethan Furman" wrote: > > On 07/09/2014 02:42 PM, Ben Hoyt wrote: >>> >>> >>> Okay, so using that [no platform specific] logic we should head over to the os module and remove: >>> >>> >>> ctermid, getenv, getegid... >>> >>> Okay, I'm tired of typing, but that list is not even half-way through the os >>> page, and those are all methods or attributes that are not available on >>> either Windows or Unix or some flavors of Unix. >> >> >> True, is this really the precedent we want to *aim for*. listdir() is >> cross-platform, > > > and listdir has serious performance issues, which is why you developed scandir. > >>> Oh, and all those [snipped] upper-case attributes? Yup, documented. And when we >>> >>> don't document it ourselves we often refer readers to their system >>> documentation because Python does not, in fact, return exactly the same >>> results on all platforms -- particularly when calling into the OS. >> >> >> But again, why a worse, less cross-platform API when a simple, >> cross-platform one is a method call away? > > > For the same reason we don't use code that makes threaded behavior better, but kills the single thread application. > > If the programmer would rather have consistency on all platforms rather than performance on the one being used, `info='lstat'` is the option to use. > > I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a couple advantages: > > - less duplication of code: in the tree_size example, the error > handling is duplicated twice > > - readablity: with the error handling in a separate routine, one > does not have to jump around the try/except blocks looking for > what happens if there are no errors The "onerror" approach can also deal with readdir failing, which the PEP currently glosses over. I'm somewhat inclined towards the current approach in the PEP, but I'd like to see an explanation of two aspects: 1. How a scandir variant with an 'onerror' option could be implemented given the version in the PEP 2. How the existing scandir module handles the 'onerror' parameter to its directory walking function Regards, Nick. > > -- > ~Ethan~ > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 10 Jul 2014 03:39, "Victor Stinner" wrote: > > 2014-07-10 9:04 GMT+02:00 Paul Moore : > > As someone (Tim?) pointed out later in the thread, > > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor > > do the dirent entries on Unix). So whether or not it's "natural", the > > "free" functionality provided by the OS is that of lstat, not that of > > stat. Presumably because it's possible to build symlink-following code > > on top of non-following code, but not the other way around. > > DirEntry methods will remain free (no syscall) for directories and > regular files. One extra syscall will be needed only for symlinks, > which are more rare than other file types (for example, you wrote " > Windows typically makes little use of symlinks"). The info we want for scandir is that of the *link itself*. That makes it easy to implement things like the "followlinks" flag of os.walk. The *far end* of the link isn't relevant at this level. The docs just need to be clear that DirEntry objects always match lstat(), never stat(). Cheers, Nick. > > See my pseudo-code: > https://mail.python.org/pipermail/python-dev/2014-July/135439.html > > On Windows, _lstat and _stat attributes will be filled directly in the > constructor on Windows for regular files and directories. > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 11 Jul 2014 12:46, "Ben Hoyt" wrote: > > [replying to python-dev this time] > > >> The "onerror" approach can also deal with readdir failing, which the > >> PEP currently glosses over. > > > > > > Do we want this, though? I can see an error handler for individual entries, > > but if one of the *dir commands fails that would seem to be fairly > > catastrophic. > > Very much agreed that this isn't necessary for just readdir/FindNext > errors. We've never had this level of detail before -- if listdir() > fails half way through (very unlikely) it just bombs with OSError and > you get no entries at all. > > If you really really want this (again very unlikely), you can always > use call next() directly and catch OSError around that call. Agreed - I think the PEP should point this out explicitly, and show that the approach it takes offers a lot of flexibility in error handling from "just let it fail", to a single try/catch around the whole loop, to try/catch just around the operations that might call lstat(), to try/catch around the individual iteration steps. os.walk remains the higher level API that most code should be using, and that has to retain the current listdir based behaviour (any error = ignore all entries in that directory) for backwards compatibility reasons. Cheers, Nick. > > -Ben > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3121, 384 Refactoring Issues
On 10 Jul 2014 19:59, "Alexander Belopolsky" wrote: > > > On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence wrote: >> >> I'm just curious as to why there are 54 open issues after both of these PEPs have been accepted and 384 is listed as finished. Did we hit some unforeseen technical problem which stalled development? > > > I tried to bring some sanity to that effort by opening a "meta issue": > > http://bugs.python.org/issue15787 > > My enthusiasm, however, vanished after I reviewed the refactoring for the datetime module: > > http://bugs.python.org/issue15390 > > My main objections are to following PEP 384 (Stable ABI) within stdlib modules. I see little benefit for the stdlib (which is shipped fresh with every new version of Python) from following those guidelines. The main downside of "do as we say, not as we do" in this case is that we miss out on the feedback loop of what the stable ABI is like to *use*. For example, the docs problem, where it's hard to tell whether an API is part of the stable ABI or not, or the performance problem Stefan mentions. Using the stable ABI for standard library extensions also serves to decouple them further from the internal details of the CPython runtime, making it more likely they will be able to run correctly on alternative interpreters (since emulating or otherwise supporting the limited API is easier than supporting the whole thing). Cheers, Nick. > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?
On 13 July 2014 11:34, Chris Angelico wrote: > On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano wrote: >>> We will see >>> later that that happens. Further, when comparing float NaNs of the same >>> identity, the list implementation forgot to special-case NaNs. Which >>> would be a bug, IMHO. >> >> "Forgot"? I don't think the behaviour of list comparisons is an >> accident. > > Well, "forgot" is on the basis that the identity check is intended to > be a mere optimization. If that were the case ("don't actually call > __eq__ when you reckon it'll return True"), then yes, failing to > special-case NaN would be a bug. But since it's intended behaviour, as > explained further down, it's not a bug and not the result of > forgetfulness. Right, it's not a mere optimisation - it's the only way to get containers to behave sensibly. Otherwise we'd end up with nonsense like: >>> x = float("nan") >>> x in [x] False That currently returns True because of the identity check - it would return False if we delegated the check to float.__eq__ because the defined IEEE754 behaviour for NaN's breaks the mathematical definition of an equivalence class as a transitive, reflexive and commutative operation. (It breaks it for *good reasons*, but we still need to figure out a way of dealing with the impedance mismatch between the definition of floats and the definition of container invariants like "assert x in [x]") The current approach means that the lack of reflexivity of NaN's stays confined to floats and similar types - it doesn't leak out and infect the behaviour of the container types. What we've never figured out is a good place to *document* it. I thought there was an open bug for that, but I can't find it right now. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?
On 13 July 2014 13:16, Chris Angelico wrote: > On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan wrote: >> What we've never figured out is a good place to *document* it. I >> thought there was an open bug for that, but I can't find it right now. > > Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found > a parallel explanation of sequence equality. We might need to expand the tables of sequence operations to cover equality and inequality checks - those are currently missing. Cheers, Nick. > > ChrisA > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Another case for frozendict
On 13 July 2014 13:43, wrote: > In its previous form, the PEP seemed more focused on some false > optimization capabilities of a read-only type, rather than as here, the > far more interesting hashability properties. It might warrant a fresh > PEP to more thoroughly investigate this angle. RIght, the use case would be "frozendict as a simple alternative to a full class definition", but even less structured than namedtuple in that the keys may vary as well. That difference means that frozendict applies more cleanly to semi-structured data manipulated as dictionaries (think stuff deserialised from JSON) than namedtuple does. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
On 13 Jul 2014 20:54, "Tim Delaney" wrote: > > On 14 July 2014 10:33, Ben Hoyt wrote: >> >> >> >> If we go with Victor's link-following .is_dir() and .is_file(), then >> we probably need to add his suggestion of a follow_symlinks=False >> parameter (defaults to True). Either that or you have to say >> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit >> less nice. > > > Absolutely agreed that follow_symlinks is the way to go, disagree on the default value. > >> >> Given the above arguments for symlink-following is_dir()/is_file() >> methods (have I missed any, Victor?), what do others think? > > > I would say whichever way you go, someone will assume the opposite. IMO not following symlinks by default is safer. If you follow symlinks by default then everyone has the following issues: > > 1. Crossing filesystems (including onto network filesystems); > > 2. Recursive directory structures (symlink to a parent directory); > > 3. Symlinks to non-existent files/directories; > > 4. Symlink to an absolutely huge directory somewhere else (very annoying if you just wanted to do a directory sizer ...). > > If follow_symlinks=False by default, only those who opt-in have to deal with the above. Or the ever popular symlink to "." (or a directory higher in the tree). I think os.walk() is a good source of inspiration here: call the flag "followlink" and default it to False. Cheers, Nick. > > Tim Delaney > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3121, 384 Refactoring Issues
On 14 Jul 2014 11:41, "Brett Cannon" wrote: > > > I agree for PEP 3121 which is the initialization/finalization work. The stable ABi is not necessary. So maybe we should re-examine the patches and accept the bits that clean up init/finalization and leave out any ABi-related changes. Martin's right about improving the subinterpreter support - every type declaration we move from a static struct to the dynamic type creation API is one that isn't shared between subinterpreters any more. That argument is potentially valid even for *builtin* modules and types, not just those in extension modules. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
On 14 Jul 2014 22:50, "Ben Hoyt" wrote: > > In light of that, I propose I update the PEP to basically follow > Victor's model of is_X() and stat() following symlinks by default, and > allowing you to specify follow_symlinks=False if you want something > other than that. > > Victor had one other question: > > > What happens to name and full_name with followlinks=True? > > Do they contain the name in the directory (name of the symlink) > > or name of the linked file? > > I would say they should contain the name and full path of the entry -- > the symlink, NOT the linked file. They kind of have to, right, > otherwise they'd have to be method calls that potentially call the > system. It would be worth explicitly pointing out "os.readlink(entry.full_name)" in the docs as the way to get the target of a symlink entry. Alternatively, it may be worth including a readlink() method directly on the entry objects. (That can easily be added later though, so no need for it in the initial proposal). > > In any case, here's the modified proposal: > > scandir(path='.') -> generator of DirEntry objects, which have: > > * name: name as per listdir() > * full_name: full path name (not necessarily absolute), equivalent of > os.path.join(path, entry.name) > * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name), > but free in most cases; cached per entry > * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name), > but free in most cases; cached per entry > * is_symlink(): like os.path.islink(), but free in most cases; cached per entry > * stat(follow_symlinks=True): like os.stat(entry.full_name, > follow_symlinks=follow_symlinks); cached per entry > > The above may not be quite perfect, but it's good, and I think there's > been enough bike-shedding on the API. :-) +1, sounds good to me (and I like having the caching guarantees listed - helps make it clear how DirEntry differs from pathlib.Path) Cheers, Nick. > > So please speak now or forever hold your peace. :-) I intend to update > the PEP to reflect this and make a few other clarifications in the > next few days. > > -Ben > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cStringIO vs io.BytesIO
On 16 Jul 2014 20:00, wrote: > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > > I believe this problem affects tornado ( https://github.com/tornadoweb/tornado/ > > Do you know if there a workaround? Maybe there is some stdlib part that I'm > > missing, or a module on PyPI? It is not that hard to write an own wrapper that > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > > existing solution or plans to fix it in Python itself - this BytesIO use case > > looks quite important. > > Regarding a fix, the problem seems mostly that the StringI/StringO > specializations were removed, and the new implementation is basically > just a StringO. Right, I don't think there's a major philosophy change here, just a missing optimisation that could be restored in 3.5. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
On 22 Jul 2014 02:46, "Steve Dower" wrote: > > Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) ) +1 for "_DirEntry" as the name in the implementation, and documenting its behaviour under "scandir" rather than as a standalone object. Only -0 for full documentation as a standalone class, though. Cheers, Nick. > > Cheers, > Steve > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 "scandir" accepted
On 23 Jul 2014 02:18, "Victor Stinner" wrote: > > 2014-07-22 17:52 GMT+02:00 Ben Hoyt : > > However, given that we have to support this for listdir() anyway, I > > think it's worth reconsidering whether scandir()'s directory argument > > can be an integer FD. Given that listdir() already supports it, it > > will almost certainly be asked for later anyway for someone who's > > porting some listdir code that uses an FD. Thoughts, Victor? > > Please focus on what was accepted in the PEP. We should first test > os.scandir(). In a few months, with better feedbacks, we can consider > extending os.scandir() to support a file descriptor. There are > different issues which should be discussed and decided to implement it > (ex: handle the lifetime of the directory file descriptor). As Victor suggests, getting the core version working and incorporated first is a good way to go. Future enhancements (like accepting a file descriptor) and refactorings (like eliminating the code duplication with listdir) don't need to (and hence shouldn't) go into the initial patch. Cheers, Nick. > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my!
On 23 Jul 2014 07:28, "Antoine Pitrou" wrote: > > Le 22/07/2014 17:03, Alex Gaynor a écrit : > >> >> The question is: >> >> a) Should we backport weak referencing _socket.sockets (changing the structure >> of the module seems overly invasive, albeit completely backwards >> compatible)? >> b) Does anyone know why weak references are used in the first place? The commit >> message just alludes to fixing a leak with no reference to an issue. > > > Because : > - the SSLSocket has a strong reference to the ssl object (self._sslobj) > - self._sslobj having a strong reference to the SSLSocket would mean both would only get destroyed on a GC collection > > I assume that's what "leak" means here :-) > > As for 2.x, I don't see why you couldn't just continue using a strong reference. As Antoine says, if the cycle already exists in Python 2 (and it sounds like it does), we can just skip backporting the weak reference change. I'll also give the Fedora Python list a heads up about your repo to see if anyone there can help you with the backport. Cheers, Nick. > > Regards > > Antoine. > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my!
On 24 Jul 2014 05:37, "Alex Gaynor" wrote: > > Possible solutions are: > > * Pass the SSLObject *in addition* to the _socket.socket object to the C code. > This generates some additional divergence from the Python3 code, but is > probably basically straightforward. > * Try to refactor the socket code in the same way as Python3 did, so we can > pass *only* the SSLObject here. This is some nasty scope creep for PEP466, > but would make the overall _ssl.c diff smaller. > * Some super sweet and simple thing I haven't thought of yet. > > Thoughts? Wearing my "risk management" hat, option 1 sounds significantly more appealing than option 2 :) Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Does Zip Importer have to be Special?
On 25 Jul 2014 03:51, "Brett Cannon" wrote: > The problem with all of this is you are essentially asking for a hook to let you have code have access to the interpreter state before it is fully initialized. Zipimport and the various bits of code that get loaded during startup are special since they are coded to avoid touching anything that isn't ready to be used. So if we expose something that allows access prior to full initialization it would have to be documented as having no guarantees of interpreter state, etc. so we are not held to some API that makes future improvements difficult. Note that this is *exactly* the problem PEP 432 is designed to handle: separating the configuration of the core interpreter from the configuration of the operating system interfaces, so the latter can run relatively normally (at least compared to today). As you say, though it's a niche problem compared to something like packaging, which is why it got bumped down my personal priority list. I haven't even got back to the first preparatory step I identified which is to separate out our main functions to a separate "Programs" directory so it's easier to distinguish "embeds Python" sections of the code from the more typical "is part of Python" and "extends Python" code. > IOW allowing for easy patching of Python is probably the best option I can think of. Yeah, that sounds reasonable - IIRC, Christian ended up going with a similar "make it patch friendly" approach for the hashing changes, rather than going overboard with configuration options. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Does Zip Importer have to be Special?
On 25 July 2014 19:33, Phil Thompson wrote: > On 24/07/2014 9:42 pm, Nick Coghlan wrote: >> As you say, though it's a niche problem compared to something like >> packaging, which is why it got bumped down my personal priority list. I >> haven't even got back to the first preparatory step I identified which is >> to separate out our main functions to a separate "Programs" directory so >> it's easier to distinguish "embeds Python" sections of the code from the >> more typical "is part of Python" and "extends Python" code. > > > Is there any way for somebody you don't trust :) to be able to help move it > forward? This thread prompted me to finally commit one of the smaller pieces of preparatory refactoring, moving the 3 applications we have that embed the CPython runtime out to a separate directory: http://bugs.python.org/issue18093 (that seems like a trivial change, but I found it made a surprisingly big difference when trying to keep the various moving parts of the initialisation sequence straight in my head) The other preparatory refactoring would be to split the monster pythonrun.c file in 2, by creating a separate "lifecycle.c" file. In my original PEP 432 branch I split it into 3 (pythonrun.c, bootstrap.c, shutdown.c) but that's actually quite an intrusive change - you end up have to expose a lot of otherwise static variables to the linker so the startup and shutdown code can both see them. Splitting in two should achieve most of the same benefits (i.e. separating the lifecycle management of the interpreter itself from the normal runtime operation code) without having to expose so much additional information to the linker (and hence change the names to include the _Py prefix). The origin of those refactorings is the fact that attempting to merge the default branch into my PEP 432 development branch (https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstrap) was generally a pain due to the merge conflicts around the structural changes. Doing the structural refactorings *first* makes it more feasible to work on the patch and do regular merges in from default. Since these are areas that aren't likely to change in a maintenance release, the risk of merge conflicts when merging forward from 3.4 to default is low even with code moved around on default. By contrast, I regularly hit significant problems when trying to merge from default to the feature branch. The existing feature branch is dated enough now (more than 18 months since the last commit!) that I wouldn't try to use it directly. Instead, I'd recommend starting a new clone based on the GitHub or BitBucket mirror (according to version control system and hosting service preference), and then use the current PEP draft and my old feature branch as a point of reference for starting another implementation attempt. (You may also be able to find some interested collaborators on http://bugs.python.org/issue13533, as I suspect PEP 432 is a prerequisite to resolving their issues as well) Cheers, Nick. P.S. I'm also starting to think that PEP 432 may pave the way for a locale independent startup sequence, which would let us offer a "-X utf8" option to tell the interpreter to ignore the OS locale settings entirely when deciding which encodings to use for various things. That would be a possible future enhancement rather than something to pursue in the initial implementation, though. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Contribute to Python.org
On 30 July 2014 01:40, Victor Stinner wrote: > Hi, > > You should read the Python Developer Guide: > > https://docs.python.org/devguide/ > > You can also join the core mentorship mailing list: > > http://pythonmentors.com/ For python.org *itself* (as in, the Django application now powering the site), the contribution process is not yet as clear, but the code and issue tracker are at https://github.com/python/pythondotorg and https://mail.python.org/mailman/listinfo/pydotorg-www is the relevant mailing list. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
On 4 Aug 2014 03:18, "Phil Thompson" wrote: > > On 03/08/2014 4:58 pm, Guido van Rossum wrote: >> >> But *are* we going to support Android officially? What's the point? Do you >> have a plan for getting Python apps to first-class status in the App Store >> (um, Google Play)? > > > I do... > > http://pyqt.sourceforge.net/Docs/pyqtdeploy/introduction.html Nice! I've only been skimming this thread, but +1 for Android mostly reading as Linux, but with an extra method in the platform module that gives more details. For those interested in mobile app development, Russell Keith-Magee also announced the release of "toga" [1] here at PyCon AU. That's a Python specific GUI library that maps directly to native widgets (rather than using theming as Kivy does). I mention it as one of the things Russell is specifically looking for is more participation from folks that know the Android side of things :) [1] http://pybee.org/toga/ Cheers, Nick. > > Phil > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Surely "nullable" is a reasonable name?
On 4 Aug 2014 18:16, "Oleg Broytman" wrote: > > Hi! > > On Mon, Aug 04, 2014 at 05:12:47PM +1000, Larry Hastings < la...@hastings.org> wrote: > > "nullable=True", which means "also accept None > > for this parameter". This was originally intended for use with > > strings (compare the "s" and "z" format units for PyArg_ParseTuple), > > however it looks like we'll have a use for "nullable ints" in the > > ongoing Argument Clinic conversion work. > > > > Several people have said they found the name "nullable" surprising, > > suggesting I use another name like "allow_none" or "noneable". I, > > in turn, find their surprise surprising; "nullable" is a term long > > associated with exactly this concept. It's used in C# and SQL, and > > the term even has its own Wikipedia page: > > > >http://en.wikipedia.org/wiki/Nullable_type > >In my very humble opinion, "nullable" is ok, but "allow_none" is > better. Yup, this is where I stand as well. The main concern I have with nullable is that we *are* writing C code when dealing with Argument Clinic, and "nullable" may make me think of a C NULL rather than Python's None. Cheers, Nick. > > Oleg. > -- > Oleg Broytmanhttp://phdru.name/p...@phdru.name >Programmers don't die, they just GOSUB without RETURN. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.walk() is going to be *fast* with scandir
On 10 August 2014 13:20, Antoine Pitrou wrote: > Le 09/08/2014 12:43, Ben Hoyt a écrit : > >> Just thought I'd share some of my excitement about how fast the all-C >> version [1] of os.scandir() is turning out to be. >> >> Below are the results of my scandir / walk benchmark run with three >> different versions. I'm using an SSD, which seems to make it >> especially faster than listdir / walk. Note that benchmark results can >> vary a lot, depending on operating system, file system, hard drive >> type, and the OS's caching state. >> >> Anyway, os.walk() can be FIFTY times as fast using os.scandir(). > > > Very nice results, thank you :-) Indeed! This may actually motivate me to start working on a redesign of walkdir at some point, with scandir and DirEntry objects as the basis. My original approach was just too slow to be useful in practice (at least when working with trees on the scale of a full Fedora or RHEL build hosted on an NFS share). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 12 Aug 2014 03:03, "Chris Barker - NOAA Federal" wrote: > > My confusion is still this: > > Repeated summation of strings has been optimized in cpython even > though it's not the recommended way to solve that problem. The quadratic behaviour of repeated str summation is a subtle, silent error. It *is* controversial that CPython silently optimises some cases of it away, since it can cause problems when porting affected code to other interpreters that don't use refcounting and thus have a harder time implementing such a trick. It's considered worth the cost, since it dramatically improves the performance of common naive code in a way that doesn't alter the semantics. > So why not special case optimize sum() for strings? We are already > special-case strings to raise an exception. > > It seems pretty pedantic to say: we cod make this work well, but we'd > rather chide you for not knowing the "proper" way to do it. Yes, that's exactly what this is - a nudge towards the right way to concatenate strings without incurring quadratic behaviour. We *want* people to learn that distinction, not sweep it under the rug. That's the other reason the implicit optimisation is controversial - it hides an important difference in algorithmic complexity from users. > Practicality beats purity? Teaching users the difference between linear time operations and quadratic ones isn't about purity, it's about passing along a fundamental principle of algorithm scalability. We do it specifically for strings because they *do* have an optimised algorithm available that we can point users towards, and concatenating multiple strings is common. Other containers don't tend to be concatenated like that in the first place, so there's no such check pushing other iterables towards itertools.chain. Regards, Nick. > > -Chris > > > > > > Although that's not the whole story: in > > practice even numerical sums get split into multiple functions because > > floating point addition isn't associative, and so needs careful > > treatment to preserve accuracy. At that point I'm strongly +1 on > > abandoning attempts to "rationalize" summation. > > > > I'm not sure how I'd feel about raising an exception if you try to sum > > any iterable containing misbehaved types like float. But not only > > would that be a Python 4 effort due to backward incompatibility, but > > it sorta contradicts the main argument of proponents ("any type > > implementing __add__ should be sum()-able"). > > > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On 12 Aug 2014 09:09, "Allen Li" wrote: > > This is a problem I sometimes run into when working with a lot of files > simultaneously, where I need three or more `with` statements: > > with open('foo') as foo: > with open('bar') as bar: > with open('baz') as baz: > pass > > Thankfully, support for multiple items was added in 3.1: > > with open('foo') as foo, open('bar') as bar, open('baz') as baz: > pass > > However, this begs the need for a multiline form, especially when > working with three or more items: > > with open('foo') as foo, \ > open('bar') as bar, \ > open('baz') as baz, \ > open('spam') as spam \ > open('eggs') as eggs: > pass I generally see this kind of construct as a sign that refactoring is needed. For example, contextlib.ExitStack offers a number of ways to manage multiple context managers dynamically rather than statically. Regards, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 12 Aug 2014 11:21, "Chris Barker - NOAA Federal" wrote: > > Sorry for the bike shedding here, but: > >> The quadratic behaviour of repeated str summation is a subtle, silent error. > > OK, fair enough. I suppose it would be hard and ugly to catch those instances and raise an exception pointing users to "".join. >> >> *is* controversial that CPython silently optimises some cases of it away, since it can cause problems when porting affected code to other interpreters that don't use refcounting and thus have a harder time implementing such a trick. > > Is there anything in the language spec that says string concatenation is O(n^2)? Or for that matter any of the performs characteristics of build in types? Those striker as implementation details that SHOULD be particular to the implementation. If you implement strings so they have multiple data segments internally (as is the case for StringIO these days), yes, you can avoid quadratic time concatenation behaviour. Doing so makes it harder to meet other complexity expectations (like O(1) access to arbitrary code points), and isn't going to happen in CPython regardless due to C API backwards compatibility constraints. For the explicit loop with repeated concatenation, we can't say "this is slow, don't do it". People do it anyway, so we've opted for the "fine, make it as fast as we can" option as being preferable to an obscure and relatively hard to debug performance problem. For sum(), we have the option of being more direct and just telling people Python's answer to the string concatenation problem (i.e. str.join). That is decidedly *not* the series of operations described in sum's documentation as "Sums start and the items of an iterable from left to right and returns the total." Regards, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On 12 August 2014 22:15, Steven D'Aprano wrote: > Compare the natural way of writing this: > > with open("spam") as spam, open("eggs", "w") as eggs, frobulate("cheese") as > cheese: > # do stuff with spam, eggs, cheese > > versus the dynamic way: > > with ExitStack() as stack: > spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in > zip(("spam", "eggs"), ("r", "w")] > cheese = stack.enter_context(frobulate("cheese")) > # do stuff with spam, eggs, cheese You wouldn't necessarily switch at three. At only three, you have lots of options, including multiple nested with statements: with open("spam") as spam: with open("eggs", "w") as eggs: with frobulate("cheese") as cheese: # do stuff with spam, eggs, cheese The "multiple context managers in one with statement" form is there *solely* to save indentation levels, and overuse can often be a sign that you may have a custom context manager trying to get out: @contextlib.contextmanager def dish(spam_file, egg_file, topping): with open(spam_file), open(egg_file, 'w'), frobulate(topping): yield with dish("spam", "eggs", "cheese") as spam, eggs, cheese: # do stuff with spam, eggs & cheese ExitStack is mostly useful as a tool for writing flexible custom context managers, and for dealing with context managers in cases where lexical scoping doesn't necessarily work, rather than being something you'd regularly use for inline code. "Why do I have so many contexts open at once in this function?" is a question developers should ask themselves in the same way its worth asking "why do I have so many local variables in this function?" Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
On 14 August 2014 07:25, Victor Stinner wrote: > Hi, > > I heard that PyPy sandbox cannot be used out of the box. You have to write a > policy to allow syscalls. The complexity is moved to this policy which is > very hard to write, especially if you only use whitelists. > > Correct me if I'm wrong. To be honest, I never take a look at this sandbox. By default, the PyPy sandbox requires all system access to be proxied through the host application (which is running in a separate process). Similarly, using "sandbox" on Fedora (et al) will get you a default deny OS level sandbox, where you have to provide selective access to things outside the box. The effective decision taken when rexec and Bastion were removed from the standard library was "sandboxing is hard enough for operating systems to get right, we're not going to try to tackle the even harder problem of an in-process sandbox". "Deny all" sandboxes are relatively easy, but also relatively useless. It's "allow these activities, but no others" that's difficult, since any kind of access can often be leveraged into greater access than was intended. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Documenting enum types
On 14 August 2014 19:25, Victor Stinner wrote: > Hi, > > IMO we should not document enum types because Python implementations other > than CPython may want to implement them differently (ex: not all Python > implementations have an enum module currently). By experience, exposing too > many things in the public API becomes a problem later when you want to > modify the code. Implementations claiming conformance with Python 3.4 will have to have an enum module - there just aren't any of those other than CPython at this point (I expect PyPy3 will catch up before too long, since the changes between 3.2 and 3.4 shouldn't be too dramatic from an implementation perspective). In this particular case, though, I think the relevant question is "Why are they enums?" and the answer is "for the better representations". I'm not clear on the use case for exposing and documenting the enum types themselves (although I don't have any real objection either). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
I just posted an updated version of PEP 467 after recently finishing the updates to the Python 3.4+ binary sequence docs to decouple them from the str docs. Key points in the proposal: * deprecate passing integers to bytes() and bytearray() * add bytes.zeros() and bytearray.zeros() as a replacement * add bytes.byte() and bytearray.byte() as counterparts to ord() for binary data * add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes() As far as I am aware, that last item poses the only open question, with the alternative being to add an "iterbytes" builtin with a definition along the lines of the following: def iterbytes(data): try: getiter = type(data).__iterbytes__ except AttributeError: iter = map(bytes.byte, data) else: iter = getiter(data) return iter Regards, Nick. PEP URL: http://www.python.org/dev/peps/pep-0467/ Full PEP text: = PEP: 467 Title: Minor API improvements for bytes and bytearray Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.5 Post-History: 2014-03-30 2014-08-15 Abstract During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. This PEP proposes a number of small adjustments to the APIs of the ``bytes`` and ``bytearray`` types to make it easier to operate entirely in the binary domain. Background == To simplify the task of writing the Python 3 documentation, the ``bytes`` and ``bytearray`` types were documented primarily in terms of the way they differed from the Unicode based Python 3 ``str`` type. Even when I `heavily revised the sequence documentation <http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that simplifying shortcut. However, it turns out that this approach to the documentation of these types had a problem: it doesn't adequately introduce users to their hybrid nature, where they can be manipulated *either* as a "sequence of integers" type, *or* as ``str``-like types that assume ASCII compatible data. That oversight has now been corrected, with the binary sequence types now being documented entirely independently of the ``str`` documentation in `Python 3.4+ <https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__ The confusion isn't just a documentation issue, however, as there are also some lingering design quirks from an earlier pre-release design where there was *no* separate ``bytearray`` type, and instead the core ``bytes`` type was mutable (with no immutable counterpart). Finally, additional experience with using the existing Python 3 binary sequence types in real world applications has suggested it would be beneficial to make it easier to convert integers to length 1 bytes objects. Proposals = As a "consistency improvement" proposal, this PEP is actually about a few smaller micro-proposals, each aimed at improving the usability of the binary data model in Python 3. Proposals are motivated by one of two main factors: * removing remnants of the original design of ``bytes`` as a mutable type * allowing users to easily convert integer values to a length 1 ``bytes`` object Alternate Constructors -- The ``bytes`` and ``bytearray`` constructors currently accept an integer argument, but interpret it to mean a zero-filled object of the given length. This is a legacy of the original design of ``bytes`` as a mutable type, rather than a particularly intuitive behaviour for users. It has become especially confusing now that some other ``bytes`` interfaces treat integers and the corresponding length 1 bytes instances as equivalent input. Compare:: >>> b"\x03" in bytes([1, 2, 3]) True >>> 3 in bytes([1, 2, 3]) True >>> bytes(b"\x03") b'\x03' >>> bytes(3) b'\x00\x00\x00' This PEP proposes that the current handling of integers in the bytes and bytearray constructors by deprecated in Python 3.5 and targeted for removal in Python 3.7, being replaced by two more explicit alternate constructors provided as class methods. The initial python-ideas thread [ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating this constructor behaviour. Firstly, a ``byte`` constructor is proposed that converts integers in the range 0 to 255 (inclusive) to a ``bytes`` object:: >>> bytes.byte(3) b'\x03' >>> bytearray.byte(3) bytearray(b'\x03') >>
Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
On 16 August 2014 03:48, Guido van Rossum wrote: > This feels chatty. I'd like the PEP to call out the specific proposals and > put the more verbose motivation later. I realised that some of that history was actually completely irrelevant now, so I culled a fair bit of it entirely. > It took me a long time to realize > that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3). I've split out the four subproposals into their own sections, so hopefully this is clearer now. > Also > your mention of bytes.byte() as the counterpart to ord() confused me -- I > think it's more similar to chr(). This was just a case of me using the wrong word - I meant "inverse" rather than "counterpart". > I don't like iterbytes as a builtin, let's > keep it as a method on affected types. Done. I also added an explanation of the benefits it offers over the more generic "map(bytes.byte, data)", as well as more precise semantics for how it will work with memoryview objects. New draft is live at http://www.python.org/dev/peps/pep-0467/, as well as being included inline below. Regards, Nick. === PEP: 467 Title: Minor API improvements for bytes and bytearray Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.5 Post-History: 2014-03-30 2014-08-15 2014-08-16 Abstract During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. This PEP proposes four small adjustments to the APIs of the ``bytes``, ``bytearray`` and ``memoryview`` types to make it easier to operate entirely in the binary domain: * Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors * Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and ``memoryview.iterbytes`` alternative iterators Proposals = Deprecation of current "zero-initialised sequence" behaviour Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:: >>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00') This PEP proposes to deprecate that behaviour in Python 3.5, and remove it entirely in Python 3.6. No other changes are proposed to the existing constructors. Addition of explicit "zero-initialised sequence" constructors - To replace the deprecated behaviour, this PEP proposes the addition of an explicit ``zeros`` alternative constructor as a class method on both ``bytes`` and ``bytearray``:: >>> bytes.zeros(3) b'\x00\x00\x00' >>> bytearray.zeros(3) bytearray(b'\x00\x00\x00') It will behave just as the current constructors behave when passed a single integer. The specific choice of ``zeros`` as the alternative constructor name is taken from the corresponding initialisation function in NumPy (although, as these are 1-dimensional sequence types rather than N-dimensional matrices, the constructors take a length as input rather than a shape tuple) Addition of explicit "single byte" constructors --- As binary counterparts to the text ``chr`` function, this PEP proposes the addition of an explicit ``byte`` alternative constructor as a class method on both ``bytes`` and ``bytearray``:: >>> bytes.byte(3) b'\x03' >>> bytearray.byte(3) bytearray(b'\x03') These methods will only accept integers in the range 0 to 255 (inclusive):: >>> bytes.byte(512) Traceback (most recent call last): File "", line 1, in ValueError: bytes must be in range(0, 256) >>> bytes.byte(1.0) Traceback (most recent call last): File "", line 1, in TypeError: 'float' object cannot be interpreted as an integer The documentation of the ``ord`` builtin will be updated to explicitly note that ``bytes.byte`` is the inverse operation for binary data, while ``chr`` is the inverse operation for text data. Behaviourally, ``bytes.byte(x)`` will be equivalent to the current ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is expected to be easier to discover and
Re: [Python-Dev] Multiline with statement line continuation
On 17 August 2014 07:42, Chris Angelico wrote: > On Sat, Aug 16, 2014 at 10:47 PM, Marko Rauhamaa wrote: >> >> You might be able to have it bothways. You could have: >> >>with (open(name) for name in os.listdir("config")) as files: > > But that's not a tuple, it's a generator. Should generators be context > managers? Is anyone seriously suggesting this? I don't think so. Is > this solutions looking for problems? Yes. We have a whole programming language to play with, when "X is hard to read" becomes a problem, it may be time to reach for a better tool. If the context manager line is getting unwieldy, it's often a sign it's time to factor it out to a dedicated helper, or break it up into multiple with statements :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
I've seen a few people on python-ideas express the assumption that there will be another Py3k style compatibility break for Python 4.0. I've also had people express the concern that "you broke compatibility in a major way once, how do we know you won't do it again?". Both of those contrast strongly with Guido's stated position that he never wants to go through a transition like the 2->3 one again. Barry wrote PEP 404 to make it completely explicit that python-dev had no plans to create a Python 2.8 release. Would it be worth writing a similarly explicit "not an option" PEP explaining that the regular deprecation and removal process (roughly documented in PEP 387) is the *only* deprecation and removal process? It could also point to the fact that we now have PEP 411 (provisional APIs) to help reduce our chances of being locked indefinitely into design decisions we aren't happy with. If folks (most signficantly, Guido) are amenable to the idea, it shouldn't take long to put such a PEP together, and I think it could help reduce some of the confusions around the expectations for Python 4.0 and the evolution of 3.x in general. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On 17 August 2014 12:43, Guido van Rossum wrote: > On Sat, Aug 16, 2014 at 6:28 PM, Nick Coghlan wrote: >> I've also had people express the concern that "you broke compatibility >> in a major way once, how do we know you won't do it again?". > > > Well, they won't, really. You can't predict the future. But really, that's a > pretty poor way to say "please don't do it again." > > I'm not sure why, but I hate when someone starts a suggestion or a question > with "why doesn't Python ..." and I have to fight the urge to reply in a > flippant way without answering the real question. (And just now I did it > again.) > > I suppose this phrasing may actually be meant as a form of politeness, but > to me it often sounds passive-aggressive, pretend-polite. (Could it be a > matter of cultural difference? The internet is full of broken English, my > own often included.) I don't mind it if the typical answers are accepted as valid: * "because it has these downsides, and those are considered to outweigh the benefits" * "because it's difficult, and it never bothered anyone enough for them to put in the work to do something about it" Those aren't always obvious, especially to folks that don't have a lot of experience with long lived software projects (I had only just started high school when Python was first released!), so I don't mind explaining them when I have time. >> Both of those contrast strongly with Guido's stated position that he >> never wants to go through a transition like the 2->3 one again. > > Right. What's more, when I say that, I don't mean that you should wait until > I retire -- I think it's genuinely a bad idea. Absolutely agreed - I think the Unicode change was worthwhile (even with the impact proving to be higher than expected), but there isn't any such fundamental change to the data model lurking for Python 3. > I also don't expect that it'll be necessary -- in fact, I am counting on > tools (e.g. static analysis!) to improve to the point where there won't be a > reason for such a transition. The fact that things like Hylang and MacroPy can already run on the CPython VM also shows that other features (like import hooks and the AST compiler) have evolved to the point where the Python data model and runtime semantics can be more effectively decoupled from syntactic details. > (Don't understand this to mean that we should never deprecate things. > Deprecations will happen, they are necessary for the evolution of any > programming language. But they won't ever hurt in the way that Python 3 > hurt.) Right. I think Python 2 has been stable for so long that I sometimes wonder if folks forget (or never knew?) we used to deprecate things within the Python 2 series as well, such that code that ran on Python 2.x wasn't necessarily guaranteed to run on Python 2.(x+2). "Never deprecate anything" is a recipe for unbounded growth in complexity. Benjamin has made a decent start on documenting that normal deprecation process in PEP 387, so I'd also suggest refining that a bit and getting it to "Accepted" as part of any explicit "Python 4.x won't be as disruptive as 3.x" clarification. >> no plans to create a Python 2.8 release. Would it be worth writing a >> similarly explicit "not an option" PEP explaining that the regular >> deprecation and removal process (roughly documented in PEP 387) is the >> *only* deprecation and removal process? It could also point to the >> fact that we now have PEP 411 (provisional APIs) to help reduce our >> chances of being locked indefinitely into design decisions we aren't >> happy with. >> >> If folks (most significantly, Guido) are amenable to the idea, it >> >> shouldn't take long to put such a PEP together, and I think it could >> help reduce some of the confusions around the expectations for Python >> 4.0 and the evolution of 3.x in general. > > But what should it say? The specific things I was thinking we could point out were: - PEP 387, documenting the normal deprecation process that existed even in Python 2 - highlighting the increased preference for "documented deprecation only" in cases where maintaining something isn't actively causing problems, there are just better alternatives now available - PEP 411, the (still relatively new) provisional API concept - PEP 405, adding pyvenv as a standard part of Python - PEP 453, better integrating PyPI into the recommended way of working with the language Those all help change the way the language evolves, as they reduce the pressure to rush things into the standard library before they'
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On 17 August 2014 15:08, Guido van Rossum wrote: > I think this would be a great topic for a blog post. Once you've written it > I can even bless it by Tweeting about it. :-) Sounds like a plan - I'll try to put together something coherent this week :) > PS. Why isn't PEP 387 accepted yet? Not sure - it mostly looks correct to me. I suspect it just fell off the radar since it's a "describe what we're already doing anyway" kind of document. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On 17 August 2014 15:34, Nick Coghlan wrote: > On 17 August 2014 15:08, Guido van Rossum wrote: >> I think this would be a great topic for a blog post. Once you've written it >> I can even bless it by Tweeting about it. :-) > > Sounds like a plan - I'll try to put together something coherent this week :) OK, make that "this afternoon": http://www.curiousefficiency.org/posts/2014/08/python-4000.html :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
On 17 August 2014 18:13, Raymond Hettinger wrote: > > On Aug 14, 2014, at 10:50 PM, Nick Coghlan wrote: > > Key points in the proposal: > > * deprecate passing integers to bytes() and bytearray() > > > I'm opposed to removing this part of the API. It has proven useful > and the alternative isn't very nice. Declaring the size of fixed length > arrays is not a new concept and is widely adopted in other languages. > One principal use case for the bytearray is creating and manipulating > binary data. Initializing to zero is common operation and should remain > part of the core API (consider why we now have list.copy() even though > copying with a slice remains possible and efficient). That's why the PEP proposes adding a "zeros" method, based on the name of the corresponding NumPy construct. The status quo has some very ugly failure modes when an integer is passed unexpectedly, and tries to create a large buffer, rather than throwing a type error. > I and my clients have taken advantage of this feature and it reads nicely. If I see "bytearray(10)" there is nothing there that suggests "this creates an array of length 10 and initialises it to zero" to me. I'd be more inclined to guess it would be equivalent to "bytearray([10])". "bytearray.zeros(10)", on the other hand, is relatively clear, independently of user expectations. > The proposed deprecation would break our code and not actually make > anything better. > > Another thought is that the core devs should be very reluctant to deprecate > anything we don't have to while the 2 to 3 transition is still in progress. > Every new deprecation of APIs that existed in Python 2.7 just adds another > obstacle to converting code. Individually, the differences are trivial. > Collectively, they present a good reason to never migrate code to Python 3. This is actually one of the inconsistencies between the Python 2 and 3 binary APIs: Python 2.7.5 (default, Jun 25 2014, 10:19:55) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> bytes(10) '10' >>> bytearray(10) bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') Users wanting well-behaved binary sequences in Python 2.7 would be well advised to use the "future" module to get a full backport of the actual Python 3 bytes type, rather than the approximation that is the 8-bit str in Python 2. And once they do that, they'll be able to track the evolution of the Python 3 binary sequence behaviour without any further trouble. That said, I don't really mind how long the deprecation cycle is. I'd be fine with fully supporting both in 3.5 (2015), deprecating the main constructor in favour of the explicit zeros() method in 3.6 (2017) and dropping the legacy behaviour in 3.7 (2018) Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray
On 18 Aug 2014 08:04, "Markus Unterwaditzer" wrote: > > On Sun, Aug 17, 2014 at 05:41:10PM -0400, Barry Warsaw wrote: > > I think the biggest API "problem" is that default iteration returns integers > > instead of bytes. That's a real pain. > > I agree, this behavior required some helper functions while porting Werkzeug to > Python 3 AFAIK. > > > > > I'm not sure .iterbytes() is the best name for spelling iteration over bytes > > instead of integers though. Given that we can't change __iter__(), I > > personally would perhaps prefer a simple .bytes property over which if you > > iterated you would receive bytes, e.g. > > I'd rather be for a .bytes() method, to match the .values(), and .keys() > methods on dictionaries. Calling it bytes is too confusing: for x in bytes(data): ... for x in bytes(data).bytes() When referring to bytes, which bytes do you mean, the builtin or the method? iterbytes() isn't especially attractive as a method name, but it's far more explicit about its purpose. Cheers, Nick. > > -- Markus > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
On 18 Aug 2014 03:07, "Raymond Hettinger" wrote: > > > On Aug 17, 2014, at 1:41 AM, Nick Coghlan wrote: > >> If I see "bytearray(10)" there is nothing there that suggests "this >> creates an array of length 10 and initialises it to zero" to me. I'd >> be more inclined to guess it would be equivalent to "bytearray([10])". >> >> "bytearray.zeros(10)", on the other hand, is relatively clear, >> independently of user expectations. > > > Zeros would have been great but that should have been done originally. > The time to get API design right is at inception. > Now, you're just breaking code and invalidating any published examples. I'm fine with postponing the deprecation elements indefinitely (or just deprecating bytes(int) and leaving bytearray(int) alone). > >>> >>> Another thought is that the core devs should be very reluctant to deprecate >>> anything we don't have to while the 2 to 3 transition is still in progress. >>> Every new deprecation of APIs that existed in Python 2.7 just adds another >>> obstacle to converting code. Individually, the differences are trivial. >>> Collectively, they present a good reason to never migrate code to Python 3. >> >> >> This is actually one of the inconsistencies between the Python 2 and 3 >> binary APIs: > > > However, bytearray(n) is the same in both Python 2 and Python 3. > Changing it in Python 3 increases the gulf between the two. > > The further we let Python 3 diverge from Python 2, the less likely that > people will convert their code and the harder you make it to write code > that runs under both. > > FWIW, I've been teaching Python full time for three years. I cover the > use of bytearray(n) in my classes and not a single person out of 3000+ > engineers have had a problem with it. I seriously question the PEP's > assertion that there is a real problem to be solved (i.e. that people > are baffled by bytearray(bufsiz)) and that the problem is sufficiently > painful to warrant the headaches that go along with API changes. Yes, I'd expect engineers and networking folks to be fine with it. It isn't how this mode of the constructor *works* that worries me, it's how it *fails* (i.e. silently producing unexpected data rather than a type error). Purely deprecating the bytes case and leaving bytearray alone would likely address my concerns. > > The other proposal to add bytearray.byte(3) should probably be named > bytearray.from_byte(3) for clarity. That said, I question whether there is > actually a use case for this. I have never seen seen code that has a > need to create a byte array of length one from a single integer. > For the most part, the API will be easiest to learn if it matches what > we do for lists and for array.array. This part of the proposal came from a few things: * many of the bytes and bytearray methods only accept bytes-like objects, but iteration and indexing produce integers * to mitigate the impact of the above, some (but not all) bytes and bytearray methods now accept integers in addition to bytes-like objects * ord() in Python 3 is only documented as accepting length 1 strings, but also accepts length 1 bytes-like objects Adding bytes.byte() makes it practical to document the binary half of ord's behaviour, and eliminates any temptation to expand the "also accepts integers" behaviour out to more types. bytes.byte() thus becomes the binary equivalent of chr(), just as Python 2 had both chr() and unichr(). I don't recall ever needing chr() in a real program either, but I still consider it an important part of clearly articulating the data model. > Sorry Nick, but I think you're making the API worse instead of better. > This API isn't perfect but it isn't flat-out broken either. There is some > unfortunate asymmetry between bytes() and bytearray() in Python 2, > but that ship has sailed. The current API for Python 3 is pretty good > (though there is still a tension between wanting to be like lists and like > strings both at the same time). Yes. It didn't help that the docs previously expected readers to infer the behaviour of the binary sequence methods from the string documentation - while the new docs could still use some refinement, I've at least addressed that part of the problem. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray
On 18 Aug 2014 08:55, "Barry Warsaw" wrote: > > On Aug 18, 2014, at 08:48 AM, Nick Coghlan wrote: > > >Calling it bytes is too confusing: > > > >for x in bytes(data): > > ... > > > >for x in bytes(data).bytes() > > > >When referring to bytes, which bytes do you mean, the builtin or the method? > > > >iterbytes() isn't especially attractive as a method name, but it's far more > >explicit about its purpose. > > I don't know. How often do you really instantiate the bytes object there in > the for loop? I'm talking more generally - do you *really* want to be explaining that "bytes" behaves like a tuple of integers, while "bytes.bytes" behaves like a tuple of bytes? Namespaces are great and all, but using the same name for two different concepts is still inherently confusing. Cheers, Nick. > > -Barry > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
On 18 Aug 2014 09:41, "Raymond Hettinger" wrote: > > > I encourage restraint against adding an unneeded class method that has no parallel > elsewhere. Right now, the learning curve is mitigated because bytes is very str-like > and because bytearray is list-like (i.e. the method names have been used elsewhere > and likely already learned before encountering bytes() or bytearray()). Putting in new, > rarely used funky method adds to the learning burden. > > If you do press forward with adding it (and I don't see why), then as an alternate > constructor, the name should be from_int() or some such to avoid ambiguity > and to make clear that it is a class method. If I remember the sequence of events correctly, I thought of map(bytes.byte, data) first, and then Guido suggested a dedicated iterbytes() method later. The step I hadn't taken (until now) was realising that the new memoryview(data).iterbytes() capability actually combines with the existing (bytes([b]) for b in data) to make the original bytes.byte idea unnecessary. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray
On 18 Aug 2014 09:57, "Barry Warsaw" wrote: > > On Aug 18, 2014, at 09:12 AM, Nick Coghlan wrote: > > >I'm talking more generally - do you *really* want to be explaining that > >"bytes" behaves like a tuple of integers, while "bytes.bytes" behaves like > >a tuple of bytes? > > I would explain it differently though, using concrete examples. > > data = bytes(...) > for i in data: # iterate over data as integers > for i in data.bytes: # iterate over data as bytes > > But whatever. I just wish there was something better than iterbytes. There's actually another aspect to your idea, independent of the naming: exposing a view rather than just an iterator. I'm going to have to look at the implications for memoryview, but it may be a good way to go (and would align with the iterator -> view changes in dict). Cheers, Nick. > > -Barry > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On 18 August 2014 11:14, Donald Stufft wrote: > On Sun, Aug 17, 2014, at 09:02 PM, Guido van Rossum wrote: >> I'm unsure about what's the single biggest pain moving to Python 3. In the >> past I would have said that it's for sure the bytes/str split (which both >> the biggest pain and the biggest payoff). >> >> But if I look carefully into the soul of teams that are still on 2.7 (I know >> a few... :-), I think the real reason is that Python 3 changes so many >> different things, you have to actually understand your code to port it >> (unlike with minor version transitions, where the changes usually spike in >> one specific area, and you can leave the rest to normal attrition and >> periodic maintenance). >> > > In my experience bytes/str is the single biggest change that causes the > most problems. Most of the other changes can be mechanically transformed > and/or papered over using helpers like six. The bytes/str change is the > main one that requires understanding code and where it requires a > serious untangling of things in code bases where str/bytes are freely > used intechangingbly. Often times this requires making a decision about > what *should* be bytes or str as well which requires having some deep > knowledge about the APIs in question too. It's certainly the one that has caused the most churn in CPython and the standard library - the ripples still haven't entirely settled on that front :) I think Guido's right that there's also a "death of a thousand cuts" aspect for large existing code bases, though, especially those that are lacking comprehensive test suites. By definition, existing large Python 2 applications are OK with the restrictions imposed by Python 2, and we're deliberately not forcing the issue by halting Python 2 maintenance. That's where Steve Dower's idea of being able to progressively declare a code base "Python 3 compatible" on a file by file basis and have some means of programmatically enforcing that is interesting - it opens the door to "opportunistic and incremental" porting, where modules are progressively updated to run on both, until an application reaches a point where it can switch to Python 3 and leave Python 2 behind. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray
On 18 August 2014 10:45, Guido van Rossum wrote: > On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw wrote: >> >> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote: >> >> >There's actually another aspect to your idea, independent of the naming: >> >exposing a view rather than just an iterator. I'm going to have to look >> > at >> >the implications for memoryview, but it may be a good way to go (and >> > would >> >align with the iterator -> view changes in dict). >> >> Yep! Maybe that will inspire a better spelling. :) > > > +1. It's just as much about b[i] as it is about "for c in b", so a view > sounds right. (The view would have to be mutable for bytearrays and for > writable memoryviews.) > > On the rest, it's sounding more and more as if we will just need to live > with both bytes(1000) and bytearray(1000). A warning sounds worse than a > deprecation to me. I'm fine with keeping bytearray(1000), since that works the same way in both Python 2 & 3, and doesn't seem likely to be invoked inadvertently. I'd still like to deprecate "bytes(1000)", since that does different things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in both. $ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))' '10' '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' $ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))' b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' Hitting the deprecation warning in single-source code would seem to be a strong hint that you have a bug in one version or the other rather than being intended behaviour. > bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes and > bytearray pretty highly. With "bytearray(1000)" sticking around indefinitely, I'm less concerned about adding a "zeros" constructor. > I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a size > one list is (or at least feels) more expensive to allocate than a size one > bytes object. So, okay. So, here's an interesting thing I hadn't previously registered: we actually already have a fairly capable "bytesview" option, and have done since Stefan implemented "memoryview.cast" in 3.3. The trick lies in the 'c' format character for the struct module, which is parsed as a length 1 bytes object rather than as an integer: >>> data = bytearray(b"Hello world") >>> bytesview = memoryview(data).cast('c') >>> list(bytesview) [b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd'] >>> b''.join(bytesview) b'Hello world' >>> bytesview[0:5] = memoryview(b"olleH").cast('c') >>> list(bytesview) [b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd'] >>> b''.join(bytesview) b'olleH world' For the read-only case, it covers everything (iteration, indexing, slicing), for the writable view case, it doesn't cover changing the shape of the target array, and it doesn't cover assigning arbitrary buffer objects (you need to wrap them in a similar cast for memoryview to allow the assignment). It's hardly the most *intuitive* spelling though - I was one of the reviewers for Stefan's memoryview rewrite back in 3.3, and I only made the connection today when looking to see how a view object like the one we were discussing elsewhere in the thread might be implemented as a facade over arbitrary memory buffers, rather than being specific to bytes and bytearray. If we went down the "bytesview" path, then a single new facade would cover not only the 3 builtins (bytes, bytearray, memoryview) but also any *other* buffer exporting type. If we so chose (at some point in the future, not as part of this PEP), such a type could allow additional bytes operations (like "count", "startswith" or "index") to be applied to arbitrary regions of memory without making a copy. We can't add those other operations to memoryview, since they don't make sense for an n-dimensional array. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 20 Aug 2014 04:18, "Marko Rauhamaa" wrote: > > Tres Seaver : > > > On 08/19/2014 01:43 PM, Ben Hoyt wrote: > >> Fair enough. I don't quite understand, though -- why is the "official > >> policy" to kill something that's "essential" on *nix? > > > > ISTM that the policy is based on a fantasy that "it looks like text to > > me in my use cases, so therefore it must be text for everyone." > > What I like about Python is that it allows me to write native linux code > without having to make portability compromises that plague, say, Java. I > have select.epoll(). I have os.fork(). I have socket.TCP_CORK. The > "textualization" of Python3 seems part of a conscious effort to make > Python more Java-esque. It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well *if* everyone reliably encodes things as UTF-8 or always uses encoding detection, but its failure mode is unfortunately silent data corruption. That said, there's a lot of Python software that is POSIX specific, where bytes paths would be the least of the barriers to porting to Windows or Jython. I'm personally +1 on consistently allowing binary paths in lower level APIs, but disallowing them in higher level explicitly cross platform abstractions like pathlib. Regards, Nick. > > > Marko > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 Aug 2014 08:19, "Greg Ewing" wrote: > > Antoine Pitrou wrote: >> >> I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. > > > The problem with scandir() in particular is that there is > currently *no* low-level API exposed that gives the same > functionality. > > If scandir() is not to support bytes paths, I'd suggest > exposing the opendir() and readdir() system calls with > bytes path support. scandir is low level (the entire os module is low level). In fact, aside from pathlib, I'd consider pretty much every API we have that deals with paths to be low level - that's a large part of the reason we needed pathlib! Cheers, Nick. > > -- > Greg > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 Aug 2014 09:06, "Chris Barker" wrote: > > As I understand it, the whole problem with some posix systems is that there is NO filesystem encoding -- i.e. you can't know for sure what encoding a filename is in. So you need to be able to pass the bytes through as they are. > > (At least as I read Armin Ronacher's blog) Armin lets his astonishment at the idea we'd expect Linux vendors to fix their broken OS get the better of him at times - he thinks the responsibility lies entirely with us to work around its quirks and limitations :) The "surrogateescape" codec is our main answer to the unreliability of the POSIX encoding model - fsdecode will squirrel away arbitrary bytes in the private use area, and then fsencode will restore them again later. That works for the simple round tripping case, but we currently lack good default tools for "cleaning" strings that may contain surrogates (or even scanning a string to see if surrogates are present). One idea I had along those lines is a surrogatereplace error handler ( http://bugs.python.org/issue22016) that emitted an ASCII question mark for each smuggled byte, rather than propagating the encoding problem. Cheers, Nick. > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 09:33, Ethan Furman wrote: > On 08/20/2014 03:31 PM, Nick Coghlan wrote: >> On 21 Aug 2014 08:19, "Greg Ewing" > <mailto:greg.ew...@canterbury.ac.nz>> wrote: >>> >>> >>> Antoine Pitrou wrote: >>>> >>>> >>>> I think if you want low-level features (such as unconverted bytes paths >>>> under POSIX), it is reasonable to point you to low-level APIs. >>> >>> >>> >>> The problem with scandir() in particular is that there is >>> currently *no* low-level API exposed that gives the same >>> functionality. >>> >>> If scandir() is not to support bytes paths, I'd suggest >>> exposing the opendir() and readdir() system calls with >>> bytes path support. >> >> >> scandir is low level (the entire os module is low level). In fact, aside >> from pathlib, I'd consider pretty much every >> API we have that deals with paths to be low level - that's a large part of >> the reason we needed pathlib! > > > If scandir is low-level, and the low-level API's are the ones that should > support bytes paths, then scandir should support bytes paths. > > Is that what you meant to say? Yes. The discussions around PEP 471 *deferred* discussions of bytes and file descriptor support to their own RFEs (not needing a PEP), they didn't decide definitively not to support them. So Serhiy's thread is entirely pertinent to that question. Note that adding bytes support still *should not* hold up the initial PEP 471 implementation - it should be done as a follow on RFE. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 12:16, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > One idea I had along those lines is a surrogatereplace error handler ( > > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > > each smuggled byte, rather than propagating the encoding problem. > > Please, don't. > > "Smuggled bytes" are not independent events. They tend to be > correlated *within* file names, and this handler would generate names > whose human semantics get lost (and there *are* human semantics, > otherwise the name would be str(some_counter)). They tend to be > correlated across file names, and this handler will generate multiple > files with the same munged name (and again, the differentiating human > semantics get lost). > > If you don't know the semantics of the intended file names, you can't > generate good replacement names. This has to be an application-level > function, and often requires user intervention to get good names. > > If you want to provide helper functions that applications can use to > clean names explicitly, that might be OK. Yeah, I was thinking in the context of reproducing sys.stdout's behaviour in Python 2, but that reproduces the bytes faithfully, so 'surrogateescape' is already offers exactly the behaviour we want (sys.stdout will have surrogateescape enabled by default in 3.5). I'll keep pondering the question of possible helper functions in the "string" module. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 14:52, Cameron Simpson wrote: > > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he wants. > (Indeed, what I want, and I'm a long time UNIX fanboy.) The part that is broken is the idea that locale encodings are a viable solution to conveying the appropriate encoding to use to talk to the operating system. We've tried trusting them with Python 3, and they're reliably wrong in certain situations. systemd is apparently better than upstart at setting them correctly (e.g. for cron jobs), but even it can't defend against an erroneous (or deliberate!) "LANG=C", or ssh environment forwarding pushing a client's locale to the server. It's worth looking through some of Armin Ronacher's complaints about Python 3 being broken on Linux, and seeing how many of them boil down to "trusting the locale is wrong, Python 3 should just assume UTF-8 on every POSIX system, the same way it does on Mac OS X". (I suspect ShiftJIS, ISO-2022, et al users might object to that approach, but it's at least a more viable choice now than it was back in 2008) I still think we made the right call at least *trying* the idea of trusting the locale encoding (since that's the officially supported way of getting this information from the OS), and in many, many situations it works fine. But I suspect we may eventually need to resolve the technical issues currently preventing us from deciding to ignore the environmental locale during interpreter startup and try something different (such as always assuming UTF-8, or trying to force C.UTF-8 if we detect the C locale, or looking for the systemd config files and using those to set the OS encoding, rather than the environmental locale). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 23:58, Marko Rauhamaa wrote: > > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. If Python tried to protect the programmer from > that possibility, the result might be even more intractable: how to act > on a file with an non-UTF-8 filename if you are unable to express it as > a text string? That's what the "surrogateescape" codec is for - we use it by default on most OS interfaces, and it's implicit in the use of "os.fsencode" and "os.fsdecode". Starting with Python 3, it's also enabled on sys.stdout by default, so that "print(os.listdir(dirname))" will pass the original raw bytes through to the terminal the same way Python 2 does. The docs could use additional details as to which interfaces do and don't have surrogateescape enabled by default, but for the time being, the description of the codec error handler just links out to the original definition in PEP 383. It may also be useful to have some tools for detecting and cleaning strings containing surrogate escaped data, but there hasn't been a concrete proposal along those lines as yet. Personally, I'm currently waiting to see if the Fedora or OpenStack folks indicate a need for such tools before proposing any additions. Regards, Nick. > > > Marko > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 22 August 2014 00:12, Nick Coghlan wrote: > On 21 August 2014 23:58, Marko Rauhamaa wrote: >> >> My point is that the poor programmer cannot ignore the possibility of >> "funny" character sets. If Python tried to protect the programmer from >> that possibility, the result might be even more intractable: how to act >> on a file with an non-UTF-8 filename if you are unable to express it as >> a text string? > > That's what the "surrogateescape" codec is for Oops, that should say "codec error handled" (I got it right later in the post). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 22 August 2014 00:41, Armin Rigo wrote: > Hi, > > On 18 August 2014 22:30, Oleg Broytman wrote: >>Aha, I see now -- the signing certificate is CAcert, which I've >> installed manually. > > I don't suppose anyone is particularly annoyed by this fact? I know > for sure two classes of people that will never click "Ignore". The > first one is people that, for lack of a less negative term, I'll call > "security freaks". The second is "serious business people" to which > the shiny new look of python.org appeals; they are likely to heed the > warning "Legitimate banks, stores, etc. will never ask you to do this" > and would regard an official hint to ignore it as highly > unprofessional. I've now raised this issue with the infrastructure team. The current hosting arrangements for bugs.python.org were put in place when the PSF didn't have any on-call system administrators of its own, but now that we do, it may be time to migrate that service to a location where we can switch to a more appropriate SSL certificate. Anyone interested in following the discussion further may wish to join infrastruct...@python.org Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 22 Aug 2014 04:45, "Benjamin Peterson" wrote: > > Perhaps some board members could comment, but I hope the PSF could just > pay a few hundred a year for a proper certificate. That's exactly what we're doing - MAL reminded me we reached the same conclusion last time this came up, we'll just track it better this time to make sure it doesn't slip through the cracks again. (And yes, switching to forced HTTPS once this is addressed would also be a good idea - we'll add it to the list) Regards, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 22 Aug 2014 09:24, "Isaac Morland" wrote: > I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever written, in every language, which wants filenames to be character strings. That's one of the best summaries of the situation I've ever seen :) Most languages (including Python 2) throw up their hands and say this is the developer's problem to deal with. Python 3 says it's *our* problem to deal with on behalf of our developers. The "surrogateescape" error handler allows recalcitrant bytes to be dealt with relatively gracefully in most situations. We don't quite cover *everything* yet (hence the complaints from some of the folks that are experts at dealing with Python 2 Unicode handling on POSIX systems), but the remaining problems are a lot more tractable than the "teach every native English speaker everywhere how to handle Unicode properly" problem. Regards, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
hon 3 is that applications should require additional complexity solely to deal with *incorrectly* configured systems and improperly encoded data and metadata (and, ideally, the detection of the need for such handling should be "Python 3 threw an exception" rather than "something further down the line detected corrupted data"). This is software rather than magic, though - these improvements only happen through people actually knuckling down and solving the related problems. When folks complain about Python 3's operating system interface handling causing problems in some situations? They're almost always referring to areas where we're still relying on the locale system on POSIX or the code page system on Windows. Both of those approaches are irredeemably broken - the answer is to stop relying on them, but appropriately updating the affected subsystems generally isn't a trivial task. A lot of the affected code runs before the interpreter is fully initialised, which makes it really hard to test, and a lot of it is incredibly convoluted due to various configuration options and platform specific details, which makes it incredibly hard to modify without breaking anything. One of those areas is the fact that we still use the old 8-bit APIs to interact with the Windows console. Those are just as broken in a multilingual world as the other Windows 8-bit APIs, so Drekin came up with a project to expose the Windows console as a UTF-16-LE stream that uses the 16-bit APIs instead: https://pypi.python.org/pypi/win_unicode_console I personally hope we'll be able to get the issues Drekin references there resolved for Python 3.5 - if other folks hope for the same thing, then one of the best ways to help that happen is to try out the win_unicode_console module and provide feedback on what does and doesn't work. Another was getting exceptions attempting to write OS data to sys.stdout when the locale settings had been scrubbed from the environment. For Python 3.5, we better tolerate that situation by setting "errors=surrogateescape" on sys.stdout when the environment claims "ascii" as a suitable encoding for talking to the operating system (this is our way of saying "we don't actually believe you, but also don't have the data we need to overrule you completely"). While I was going to wait for more feedback from Fedora folks before pushing the idea again, this thread also makes me think it would be worth our while to add more tools for dealing with surrogate escapes and latin-1 binary data smuggling just to help make those techniques more discoverable and accessible: http://bugs.python.org/issue18814#msg225791 These various discussions are also giving me plenty of motivation to get back to working on PEP 432 (the rewrite of the interpreter startup sequence) for Python 3.5. A lot of these things are just plain hard to change because of the complexity of the current startup code. Redesigning that to use a cleaner, multiphase startup sequence that gets the core interpreter running *before* configuring the operating system integration should give us several more options when it comes to dealing with some of these challenges. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bytes path related questions for Guido
At Guido's request, splitting out two specific questions from Serhiy's thread where I believe we could do with an explicit "yes or no" from him. 1. Should we accept patches adding support for the direct use of bytes paths in lower level filesystem manipulation APIs? (i.e. everything that isn't pathlib) This was Serhiy's original question (due to some open issues [1,2]). I think the answer is yes, as we already do in some cases, and the "pathlib doesn't support binary paths" design decision is a high level platform independent API vs low level potentially platform dependent API one rather than being about disallowing the use of bytes paths in general. [1] http://bugs.python.org/issue19997 [2] http://bugs.python.org/issue20797 2. Should we add some additional helpers to the string module for dealing with surrogate escaped bytes and other techniques for smuggling arbitrary binary data as text? My proposal [3] is to add: * string.escaped_surrogates (constant with the 128 escaped code points) * string.clean(s): replaces surrogates with '\ufffd' or another specified code point * string.redecode(s, encoding): encodes a string back to bytes and then decodes it again using the specified encoding (the old encoding defaults to 'latin-1' to match the assumptions in WSGI) "s != string.clean(s)" would then serve as a check for "does this string contain any surrogate escaped bytes?" [3] http://bugs.python.org/issue18814#msg225791 Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path related questions for Guido
On 24 August 2014 14:44, Nick Coghlan wrote: > 2. Should we add some additional helpers to the string module for > dealing with surrogate escaped bytes and other techniques for > smuggling arbitrary binary data as text? > > My proposal [3] is to add: > > * string.escaped_surrogates (constant with the 128 escaped code points) > * string.clean(s): replaces surrogates with '\ufffd' or another > specified code point > * string.redecode(s, encoding): encodes a string back to bytes and > then decodes it again using the specified encoding (the old encoding > defaults to 'latin-1' to match the assumptions in WSGI) Serhiy & Ezio convinced me to scale this one back to a proposal for "codecs.clean_surrogate_escapes(s)", which replaces surrogates that may be produced by surrogateescape (that's what string.clean() above was supposed to be, but my description was not correct, and the name was too vague for that error to be obvious to the reader) "s != codecs.clean_surrogate_escapes(s)" would then become the check for "does this string contain any surrogate escaped bytes?" Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path related questions for Guido
On 25 August 2014 00:23, Antoine Pitrou wrote: > Le 24/08/2014 09:04, Nick Coghlan a écrit : >> Serhiy & Ezio convinced me to scale this one back to a proposal for >> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that >> may be produced by surrogateescape (that's what string.clean() above >> was supposed to be, but my description was not correct, and the name >> was too vague for that error to be obvious to the reader) > > > "clean" conveys the wrong meaning. It should use a scary word such as > "trap". "Cleaning" surrogates is unlikely to be the right procedure when > dealing with surrogates produced by undecodable byte sequences. "purge_surrogate_escapes" was the other term that occurred to me. Either way, my use case is to filter them out when I *don't* want to pass them along to other software, but would prefer the Unicode replacement character to the ASCII question mark created by using the "replace" filter when encoding. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path related questions for Guido
On 25 Aug 2014 03:55, "Guido van Rossum" wrote: > > Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised). Thanks! > The status of pathlib is a little unclear to me -- is there a plan to eventually support bytes or not? It's text only and Antoine plans to keep it that - the concatenation operations, etc, are really only safe if you decode first. > > For #2 I think you should probably just work with the others you have mentioned. Yes, that sounds like a good idea. There's been some good progress on the issue tracker, so I think we can thrash out some workable (and comprehensible!) utilities that will be useful in their own right while also serving as aids to understanding for the underlying mechanisms. Cheers, Nick. > > > On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan wrote: >> >> At Guido's request, splitting out two specific questions from Serhiy's >> thread where I believe we could do with an explicit "yes or no" from >> him. >> >> 1. Should we accept patches adding support for the direct use of bytes >> paths in lower level filesystem manipulation APIs? (i.e. everything >> that isn't pathlib) >> >> This was Serhiy's original question (due to some open issues [1,2]). I >> think the answer is yes, as we already do in some cases, and the >> "pathlib doesn't support binary paths" design decision is a high level >> platform independent API vs low level potentially platform dependent >> API one rather than being about disallowing the use of bytes paths in >> general. >> >> [1] http://bugs.python.org/issue19997 >> [2] http://bugs.python.org/issue20797 >> >> 2. Should we add some additional helpers to the string module for >> dealing with surrogate escaped bytes and other techniques for >> smuggling arbitrary binary data as text? >> >> My proposal [3] is to add: >> >> * string.escaped_surrogates (constant with the 128 escaped code points) >> * string.clean(s): replaces surrogates with '\ufffd' or another >> specified code point >> * string.redecode(s, encoding): encodes a string back to bytes and >> then decodes it again using the specified encoding (the old encoding >> defaults to 'latin-1' to match the assumptions in WSGI) >> >> "s != string.clean(s)" would then serve as a check for "does this >> string contain any surrogate escaped bytes?" >> >> [3] http://bugs.python.org/issue18814#msg225791 >> >> Regards, >> Nick. >> >> -- >> Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Fwd: Accepting PEP 440: Version Identification and Dependency Specification
Antoine pointed out that it would still be a good idea to forward packaging PEP acceptance announcements to python-dev, even when the actual acceptance happens on distutils-sig. That makes sense to me, so here's last week's notice of the acceptance of PEP 440, the implementation independent versioning standard derived from pkg_resources, PEP 386, and ideas from both Linux distributions and other open source language communities. Regards, Nick. -- Forwarded message ------ From: Nick Coghlan Date: 22 August 2014 22:34 Subject: Accepting PEP 440: Version Identification and Dependency Specification To: DistUtils mailing list I just pushed Donald's final round of edits in response to the feedback on the last PEP 440 thread, and as such I'm happy to announce that I am accepting PEP 440 as the recommended approach to identifying versions and specifying dependencies when distributing Python software. The PEP is available in the usual place at http://www.python.org/dev/peps/pep-0440/ It's been a long road to get to an implementation independent versioning standard that has a feasible migration path from the current pkg_resources defined de facto standard, and I'd like to thank a few folks: * Donald Stufft for his extensive work on PEP 440 itself, especially the proof of concept integration into pip * Vinay Sajip for his efforts in validating earlier versions of the PEP * Tarek Ziadé for starting us down the road to an implementation independent versioning standard with the initial creation of PEP 386 back in June 2009, more than five years ago! Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 27 Aug 2014 02:52, "Terry Reedy" wrote: > > On 8/26/2014 9:11 AM, R. David Murray wrote: >> >> On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: >>> >>> As some examples of where bilingual computing breaks down: >>> >>> * My NFS client and server may have different locale settings >>> * My FTP client and server may have different locale settings >>> * My SSH client and server may have different locale settings >>> * I save a file locally and send it to someone with a different locale setting >>> * I attempt to access a Windows share from a Linux client (or vice-versa) >>> * I clone my POSIX hosted git or Mercurial repository on a Windows client >>> * I have to connect my Linux client to a Windows Active Directory >>> domain (or vice-versa) >>> * I have to interoperate between native code and JVM code >>> >>> The entire computing industry is currently struggling with this >>> monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale >>> encoding/code pages) -> multilingual (Unicode) transition. It's been >>> going on for decades, and it's still going to be quite some time >>> before we're done. >>> >>> The POSIX world is slowly clawing its way towards a multilingual model >>> that actually works: UTF-8 >>> Windows (including the CLR) and the JVM adopted a different >>> multilingual model, but still one that actually works: UTF-16-LE > > > Nick, I think the first half of your post is one of the clearest expositions yet of 'why Python 3' (in particular, the str to unicode change). It is worthy of wider distribution and without much change, it would be a great blog post. Indeed, I had the same idea - I had been assuming users already understood this context, which is almost certainly an invalid assumption. The blog post version is already mostly written, but I ran out of weekend. Will hopefully finish it up and post it some time in the next few days :) >> This kind of puts the "length" of the python2->python3 transition >> period in perspective, doesn't it? I realised in writing the post that ASCII is over 50 years old at this point, while Unicode as an official standard is more than 20. By the time this is done, we'll likely be talking 30+ years for Unicode to displace the confusing mess that is code pages and locale encodings :) Cheers, Nick. > > > -- > Terry Jan Reedy > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows Unicode console support [Was: Bytes path support]
On 27 August 2014 01:23, Paul Moore wrote: > On 24 August 2014 04:27, Nick Coghlan wrote: >> One of those areas is the fact that we still use the old 8-bit APIs to >> interact with the Windows console. Those are just as broken in a >> multilingual world as the other Windows 8-bit APIs, so Drekin came up >> with a project to expose the Windows console as a UTF-16-LE stream >> that uses the 16-bit APIs instead: >> https://pypi.python.org/pypi/win_unicode_console >> >> I personally hope we'll be able to get the issues Drekin references >> there resolved for Python 3.5 - if other folks hope for the same >> thing, then one of the best ways to help that happen is to try out the >> win_unicode_console module and provide feedback on what does and >> doesn't work. > > This looks very cool, and I plan on giving it a try. But I don't see > any issues mentioned there (unless you mean the fact that it's not > possible to hook into Python's interactive interpreter directly, but I > don't see how that could be fixed in an external module). There's no > open issues on the project's github tracker. There are two links to CPython issues from the project description: http://bugs.python.org/issue1602 http://bugs.python.org/issue17620 Part of the feedback on those was that as much as possible should be made available as a third party module before returning to the question of how to update CPython. If we can get additional confirmation that the module addresses the CLI integration issues, then we can take a closer look at switching CPython itself over. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 27 August 2014 08:52, Nick Coghlan wrote: > On 27 Aug 2014 02:52, "Terry Reedy" wrote: >> Nick, I think the first half of your post is one of the clearest >> expositions yet of 'why Python 3' (in particular, the str to unicode >> change). It is worthy of wider distribution and without much change, it >> would be a great blog post. > > Indeed, I had the same idea - I had been assuming users already understood > this context, which is almost certainly an invalid assumption. > > The blog post version is already mostly written, but I ran out of weekend. > Will hopefully finish it up and post it some time in the next few days :) Aaand, it's up: http://www.curiousefficiency.org/posts/2014/08/multilingual-programming.html Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 28 Aug 2014 04:20, "Glenn Linderman" wrote: > > On 8/27/2014 5:16 AM, Nick Coghlan wrote: >> >> On 27 August 2014 08:52, Nick Coghlan wrote: >>> >>> On 27 Aug 2014 02:52, "Terry Reedy" wrote: >>>> >>>> Nick, I think the first half of your post is one of the clearest >>>> expositions yet of 'why Python 3' (in particular, the str to unicode >>>> change). It is worthy of wider distribution and without much change, it >>>> would be a great blog post. >>> >>> Indeed, I had the same idea - I had been assuming users already understood >>> this context, which is almost certainly an invalid assumption. >>> >>> The blog post version is already mostly written, but I ran out of weekend. >>> Will hopefully finish it up and post it some time in the next few days :) >> >> Aaand, it's up: >> http://www.curiousefficiency.org/posts/2014/08/multilingual-programming.html >> >> Cheers, >> Nick. >> > > Indeed, I also enjoyed and found enlightening your response to this issue, including the broader historical context. I remember when Unicode was first published back in 1991, and it sounded interesting, but far removed from the reality of implementations of the day. I was intrigued by UTF-8 at the time, and even wrote an encoder and decoder for it for a software package that eventually never reached any real customers. > > Your blog post says: >> >> Choosing UTF-8 aims to treat formatting text for communication with the user as "just a display issue". It's a low impact design that will "just work" for a lot of software, but it comes at a price: >> >> because encoding consistency checks are mostly avoided, data in different encodings may be freely concatenated and passed on to other applications. Such data is typically not usable by the receiving application. > > > I don't believe this is a necessary result of using UTF-8. It is a possible result, and I guess some implementations are using it this way, but a proper language could still provide and/or require proper usage of UTF-8 data through its type system just as Python3 is doing with PEP 393. Yes, Go works that way, for example. I doubt it actually checks for valid UTF-8 at OS boundaries though - that would be a potentially expensive check, and as a network service centric language, Go can afford to place more constraints on the operating environment than we can. >In fact, if it were not for the requirement to support passing character strings in other formats (UTF-16, UTF-32) to historical APIs (in CPython add-on packages) and the resulting practical performance considerations of converting to/from UTF-8 repeatedly when calling those APIs, Python3 could have evolved to using UTF-8 as its underlying data format, and obtained equal encoding consistency as it has today. We already have string processing algorithms that work for fixed width encodings (and are known not to work for variable width encodings, hence the bugs in Unicode handling on the old narrow builds). It isn't that variable width encodings aren't a viable choice for programming language text modelling, it's that the assumption of a fixed width model is more deeply entrenched in CPython (and especially the C API) than the exact number of bits used per code point. > Of course, nothing can be "required" if the user chooses to continue operating in the encoded domain, and manipulate data using the necessary byte-oriented features of of whatever language is in use. > > One of the choices of Python3, was to retain character indexing as an underlying arithmetic implementation citing algorithmic speed, but that is a seldom needed operation, and of limited general applicability when considering grapheme clusters. The choice that was made was to say no to the question "Do we rewrite a Unicode type that we already know works from scratch?". The decisions about how to handle *text* were made way back before the PEP process even existed, and later captured as PEP 100. What changed in Python 3 was dropping the hybrid 8-bit str type with its locale dependent behaviour, and parcelling its responsibilities out to either the existing unicode type (renamed as str, as it was the default choice), or the new locale independent bytes type. > An iterator based approach can solve both problems, but would have been best introduced as part of Python3.0, although it may have made 2to3 harder, and may have made it less practical to implement six and other "run on both Py2 and Py3" type solutions harder, without introducing those same iterative solutions into Python 2.6 or 2.7. The option of fundamentally changing the text handling design was never on the ta
[Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)
On 26 Aug 2014 21:34, "MRAB" wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: >> >> Nick Coghlan writes: >> >> > "purge_surrogate_escapes" was the other term that occurred to me. >> >> "purge" suggests removal, not replacement. That may be useful too. >> >> neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') >> > How about: > > replace_surrogate_escapes(s, replacement='\uFFFD') > > If you want them removed, just pass an empty string as the replacement. The current proposal on the issue tracker is to instead take advantage of the existing error handlers: def convert_surrogateescape(data, errors='replace'): return data.encode('utf-8', 'surrogateescape').decode('utf-8', errors) That code is short, but semantically dense - it took a few iterations to come up with that version. (Added bonus: once you're alerted to the possibility, it's trivial to write your own version for existing Python 3 versions. The standard name just makes it easier to look up when you come across it in a piece of code, and provides the option of optimising it later if it ever seems worth the extra work) I also filed a separate RFE to make backslashreplace usable on input, since that allows the option of separating the replacement operation from the encoding operation. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)
On 29 August 2014 10:32, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > The current proposal on the issue tracker is to instead take advantage of > > the existing error handlers: > > > > def convert_surrogateescape(data, errors='replace'): > > return data.encode('utf-8', 'surrogateescape').decode('utf-8', > errors) > > > > That code is short, but semantically dense > > And it doesn't implement your original suggestion of replacement with > '?' (and another possibility for history buffs is 0x1A, ASCII SUB). At > least, AFAICT from the docs there's no way to specify the replacement > character; decoding always uses U+FFFD. (If I knew how to do that, I > would have suggested this.) If that actually matters in a given context, I can do an ordinary string replacement later. I couldn't think of a case where it actually mattered though - if "must be ASCII" was a requirement, then backslashreplace was a suitable alternative that lost less information (hence the RFE to make that also usable on input). > > (Added bonus: once you're alerted to the possibility, it's trivial > > to write your own version for existing Python 3 versions. > > I'm not sure that's true. At least, to me that code was obvious -- I > got the exact definition (except for the function name) on the first > try -- but I ruled it out because it didn't implement your suggestion > of replacement with '?', even as an option. Yeah, part of the tracker discussion involved me realising that part wasn't a necessary requirement - the key is being able to get rid of the surrogates, or replace them with something readily identifiable, and less about being able to control exactly what they get replaced by. > OTOH, I think a lot of the resistance to codec-based solutions is the > misconception that en/decoding streams is expensive, or the > misconception that Python's internal representation of text as an > array of code points (rather than an array of "characters" or > "grapheme clusters") is somehow insufficient for text processing. We don't actually have any technical deep dives into how Python 3's text handling works readily available online, so there's a lot of speculation and misinformation floating around. My recent article gives the high level context, but it really needs to be paired up with a piece (or pieces) that go deep into the details of codec optimisation, the UTF-8 caching, how it integrates with the UTF-16-LE Windows APIs, how the internal storage structure is determined at allocation time, how it maintains compatibility with the legacy C extension APIs, etc. The only current widely distributed articles on those topics are written from a perspective that assumes we don't know anything about Unicode, and are just making things unnecessarily complicated (rather than solving hard cross platform compatibility and text processing performance problems). That perspective is incorrect, but "trust me, they're wrong" doesn't work very well with people that are already angry. Text manipulation is one of the most sophisticated subsystems in the interpreter, though, so it's hard to know where to start on such a series (and easy to get intimidated by the sheer magnitude of the work involved in doing it right). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 30 Aug 2014 06:08, "Ethan Furman" wrote: > > On 08/29/2014 01:00 PM, M.-A. Lemburg wrote: >> >> On 29.08.2014 21:47, Alex Gaynor wrote: >>> >>> >>> I've just submitted PEP 476, on enabling certificate validation by default for >>> HTTPS clients in Python. Please have a look and let me know what you think. >> >> >> Thanks for the PEP. I think this is generally a good idea, >> but some important parts are missing from the PEP: >> >> * transition plan: >> >> I think starting with warnings in Python 3.5 and going >> for exceptions in 3.6 would make a good transition >> >> Going straight for exceptions in 3.5 is not in line with >> our normal procedures for backwards incompatible changes. >> >> * configuration: >> >> It would be good to be able to switch this on or off >> without having to change the code, e.g. via a command >> line switch and environment variable; perhaps even >> controlling whether or not to raise an exception or >> warning. >> >> * choice of trusted certificate: >> >> Instead of hard wiring using the system CA roots into >> Python it would be good to just make this default and >> permit the user to point Python to a different set of >> CA roots. >> >> This would enable using self signed certs more easily. >> Since these are often used for tests, demos and education, >> I think it's important to allow having more control of >> the trusted certs. > > > +1 for PEP with above changes. Ditto from me. In relation to changing the Python CLI API to offer some of the wget/curl style command line options, I like the idea of providing recipes in the docs for implementing them at the application layer, but postponing making the *default* behaviour configurable that way. Longer term, I'd like to actually have a per-runtime configuration file for some of these things that also integrated with the pyvenv support, but that requires untangling the current startup code first (and there are only so many hours in the day). Regards, Nick. > > -- > ~Ethan~ > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 31 August 2014 12:21, R. David Murray wrote: > On Sun, 31 Aug 2014 03:25:25 +0200, Antoine Pitrou > wrote: >> On Sun, 31 Aug 2014 09:26:30 +1000 >> Nick Coghlan wrote: >> > In relation to changing the Python CLI API to offer some of the wget/curl >> > style command line options, I like the idea of providing recipes in the >> > docs for implementing them at the application layer, but postponing making >> > the *default* behaviour configurable that way. >> >> I'm against any additional environment variables and command-line >> options. It will only complicate and obscure the security parameters of >> certificate validation. As Antoine says here, I'm also opposed to adding more Python specific configuration options. However, I think there may be something worthwhile we can do that's closer to the way browsers work, and has the significant benefit of being implementable as a PyPI module first (more on that in a separate reply). >> The existing knobs have already been mentioned in this thread, I won't >> mention them here again. > > Do those knobs allow one to instruct urllib to accept an invalid > certificate without changing the program code? Only if you add the specific certificate concerned to the certificate store that Python is using (which PEP 476 currently suggests will be the platform wide certificate store). Whether or not that is an adequate solution is the point currently in dispute. My view is that the core problem/concern we need to address here is how we manage the migration away from a network communication model that trusts the network by default. That transition will happen regardless of whether or not we adapt Python as a platform - the challenge for us is how we can address it in a way that minimises the impact on existing users, while still ensuring future users are protected by default. This would be relatively easy if we only had to worry about the public internet (since we're followers rather than leaders in that environment), but we don't. Python made the leap into enterprise environments long ago, so we not only need to cope with corporate intranets, we need to cope with corporate intranets that aren't necessarily being well managed. That's what makes this a harder problem for us than it is for a new language like Go that was created by a public internet utility, specifically for use over the public internet - they didn't *have* an installed base to manage, they could just build a language specifically tailored for the task of running network services on Linux, without needing to account for any other use cases. The reason our existing installed base creates a problem is because corporate network security has historically focused on "perimeter defence": carving out a trusted island behind the corporate firewall where users and other computer systems could be "safely" assumed not to be malicious. As an industry, we have learned though harsh experience that *this model doesn't work*. You can't trust the network, period. A corporate intranet is *less* dangerous than the public internet, but you still can't trust it. This "don't trust the network" ethos is also reinforced by the broad shift to "utility computing" where more and more companies are running distributed networks, where some of their systems are actually running on vendor provided servers. The "network perimeter" is evaporating, as corporate "intranets" start to look a lot more like recreations of the internet in miniature, with the only difference being the existence of more formal contractual relationships than typically exist between internet peers. Unfortunately, far too many organisations (especially those outside the tech industry) still trust in perimeter defence for their internal network security, and hence tolerate the use of unsecured connections, or skipping certificate validation internally. This is actually a really terrible idea, but it's still incredibly common due to the general failure of the technology industry to take usability issues seriously when we design security systems - doing the wrong "unsafe" thing is genuinely easier than doing things right. We have enough evidence now to be able to say (as Alex does in PEP 476) that it has been comprehensively demonstrated that "opt-in security" really just means "security failures are common and silent by default". We've seen it with C buffer overflow vulnerabilities, we've seen it with plain text communication links, we've seen it with SSL certificate validation - the vast majority of users and developers will just run with the default behaviour of the platform or application they're using, even if those defaults have serious problems. As the saying goes, "you
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 31 August 2014 12:21, R. David Murray wrote: > Do those knobs allow one to instruct urllib to accept an invalid > certificate without changing the program code? My first reply ended up being a context dump of the challenges created by legacy corporate intranets that may not be immediately obvious to folks that spend most of their time working on or with the public internet. I decided to split these more technical details out to a new reply for the benefit of folks that already know all that history :) To answer David's specific question, the existing knobs at the OpenSSL level (SSL_CERT_DIR and SSL_CERT_FILE ) let people add an internal CA, opt out of the default CA system, and trust *specific* self-signed certs. What they don't allow is a global "trust any cert" setting - exceptions need to be added at the individual cert level or at the CA level, or the application needs to offer an option to not do cert validation at all. That "trust anything" option at the platform level is the setting that is a really bad idea - if an organisation thinks it needs that (because they have a lot of self-signed certs, but aren't verifying their HTTPS connections to those servers), then what they really need is an internal CA, where their systems just need to be set up to trust the internal CA in addition to the platform CA certs. With Alex's proposal, organisations that are already running an internal CA should be just fine - Python 3.5 will see the CA cert in the platform cert store and accept certs signed by it as valid. (Note: the Python 3.4 warning should take this into account, which could be a problem since we don't currently do validity checks against the platform store by default. The PEP needs to cover the mechanics of that in more detail, as I think it means we'll need to make *some* changes to the default configuration even in Python 3.4 to get accurate validity data back from OpenSSL) However, we also need to accept that there's a reason browser vendors still offer "click through insecurity" for sites with self-signed certificates, and tools like wget/curl offer the option to say "don't check the certificate": these are necessary compromises to make SSL based network connections actually work on many current corporate intranets. It is corporate environments that also make it desirable to be able to address this potential problem at a *user* level, since many Python users in a large organisations are actually running Python entirely out of their home directories, rather than as a system installation (they may not even have admin access to their own systems). My suggestion at this point is that we take a leaf from both browser vendors and the design of SSH: make it easy to *add* a specific self-signed cert to the set a *particular user* trusts by default (preferably *only* for a particular host, to limit the power of such certs). "python -m ssl" doesn't currently do anything interesting, so it could be used to provide an API for managing that user level certificate store. A Python-specific user level cert store is something that could be developed as a PyPI library for Python 2.7.9+ and 3.4+ (Is cert management considered in scope for cryptography.io? If so, that could be a good home). So while I agree with the intent of PEP 476, and like the suggested end state, I'm back to thinking that the transition plan for existing corporate users needs more work before it can be accepted. This is especially true since it becomes another barrier to migrating from Python 2.7 to Python 3.5+ (a warning in Python 3.4 doesn't help with that aspect, although a new -3 warning might). A third party module that offers a user level certificate store, and a gevent.monkey style way of opting in to this behaviour for existing Python versions would be one way to provide a more compelling transition plan. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 31 August 2014 16:16, Donald Stufft wrote: > > On Aug 31, 2014, at 2:09 AM, Nick Coghlan wrote: > > At the same time, we need to account for the fact that most existing > organisations still trust in perimeter defence for their internal > network security, and hence tolerate (or even actively encourage) the > use of unsecured connections, or skipping certificate validation, > internally. This is actually a really terrible idea, but it's still > incredibly common due to the general failure of the technology > industry to take usability issues seriously when we design security > systems (at least until recently) - doing the wrong "unsafe" thing is > genuinely easier than doing things right. > > > Just a quick clarification in order to be a little clearer, this change will > (obviously) only effect those who trust perimeter security *and* decided to > install an invalid certificate instead of just using HTTP. I'm not saying > that > this doesn't happen, just being specific (I'm not actually sure why they > would > install a TLS certificate at all if they are trusting perimeter security, > but > I'm sure folks do). It's the end result when a company wide edict to use HTTPS isn't backed up by the necessary documentation and training on how to get a properly signed cert from your internal CA (or, even better, when such an edict comes down without setting up an internal CA first). Folks hit the internet instead, find instructions on creating a self-signed cert, install that, and tell their users to ignore the security warning and accept the cert. Historically, Python clients have "just worked" in environments that required a click-through on the browser side, since you had to opt in to checking the certificates properly. Self-signed certificates can also be really handy for doing local testing - you're not really aiming to authenticate the connection in that case, you're just aiming to test that the secure connection machinery is all working properly. (As far as the "what about requests?" question goes - that's in a similar situation to Go, where being new allows it to choose different defaults, and folks for whom those defaults don't work just won't use it. There's also the fact that most corporate Python users are unlikely to know that PyPI exists, let alone that it contains a module called "requests" that does SSL certificate validation by default. Those of us in the corporate world that interact directly with upstream are still the exception rather than the rule) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 Sep 2014 06:32, "Paul Moore" wrote: > > On 31 August 2014 21:15, Antoine Pitrou wrote: > > What do you call your local cert store? > > I was referring to Christian's comment > > It's very simple to trust a self-signed certificate: just download it and stuff it into the trust store. > > From his recent response, I guess he meant the system store, and he > agrees that this is a bad option. > > OK, that's fair, but: > > a) Is there really no OS-level personal trust store? I'm thinking of > Windows here for my own personal use, but the same question applies > elsewhere. > b) I doubt my confusion over Christian's response is atypical. Based > on what he said, if we hadn't had the subsequent discussion, I would > probably have found a way to add a cert to "the store" without > understanding the implications. While it's not Python's job to educate > users, it would be a shame if its default behaviour led people to make > ill-informed decisions. Right, this is why I came to the conclusion we need to follow the browser vendors lead here and support a per-user Python specific supplementary certificate cache before we can start validating certs by default at the *Python* level. There are still too many failure modes for cert management on private networks for us to safely ignore the use case of needing to force connections to services with invalid certs. We don't need to *solve* that problem here today - we can push it back to Alex (and anyone else interested) as a building block to investigate providing as part of cryptography.io or certi.fi, with a view to making a standard library version of that (along with any SSL module updates) part of PEP 476. In the meantime, we can update the security considerations for the ssl module to make it clearer that the defaults are set up for trusted networks and that using it safely on the public internet may mean you're better off with a third party library like requests or Twisted. (I'll start another thread shortly that is highly relevant to that topic) Regards, Nick. > > Maybe an SSL HOWTO would be a useful addition to the docs, if anyone > feels motivated to write one. > > Regardless, thanks for the education! > > Paul > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 477: selected ensurepip backports for Python 2.7
Earlier versions of PEP 453 proposed bootstrapping pip into a Python 2.7 maintenance release in addition to including it with Python 3.4. That part of the proposal proved to be controversial, so we dropped it from the original PEP in order to focus on meeting the Python 3.4 specific release deadlines. This also had the benefit of working out the kinks in the bootstrapping processing as part of the Python 3.4 release cycle. However, we still think we should start providing pip by default to Python 2.7 users as well, at least as part of the Windows and Mac OS X installers. One notable difference from PEP 453 is that because there is no venv module in 2.7, and hence no integration between venv and ensurepip, we can give redistributors the option of just disabling ensurepip entirely and redirecting users to platform specific installation tools. Regards, Nick. == PEP: 477 Title: Backport ensurepip (PEP 453) to Python 2.7 Version: $Revision$ Last-Modified: $Date$ Author: Donald Stufft Nick Coghlan Status: Active Type: Process Content-Type: text/x-rst Created: 26-Aug-2014 Post-History: 1-Sep-2014 Abstract This PEP proposes that the ``ensurepip`` module, added to Python 3.4 by PEP 453, be backported to Python 2.7. It also proposes that automatic invocation of ``ensurepip`` be added to the Python 2.7 Windows and OSX installers. However it does **not** propose that automatic invocation be added to the ``Makefile``. It also proposes that the documentation changes for the package distribution and installation guides be updated to match that in 3.4, which references using the ``ensurepip`` module to bootstrap the installer. Rationale = Python 2.7 is effectively a LTS release of Python which represents the end of the 2.x series and there is still a very large contingent of users whom are still using Python 2.7 as their primary version. These users, in order to participate in the wider Python ecosystem, must manually attempt to go out and find the correct way to bootstrap the packaging tools. It is the opinion of this PEP that making it as easy as possible for end users to participate in the wider Python ecosystem is important for 3 primary reasons: 1. The Python 2.x to 3.x migration has a number of painpoints that are eased by a number of third party modules such as six [#six]_, modernize [#modernize]_, or future [#future]_. However relying on these tools requires that everyone who uses the project have a tool to install these packages. 2. In addition to tooling to aid in migration from Python 2.x to 3.x, there are also a number of modules that are *new* in Python 3 for which there are backports available on PyPI. This can also aid in the ability for people to write 2.x and 3.x compatible software as well as enable them to use some of the newer features of Python 3 on Python 2. 3. Users also will need a number of tools in order to create python packages that conform to the newer standards that are being proposed. Things like setuptools [#setuptools]_, Wheel [#wheel]_, and twine [#twine]_ are enabling a safer, faster, and more reliable packaging tool chain. These tools can be difficult for people to use if first they must be told how to go out and install the package manager. 4. One of Pythons biggest strengths is in the huge ecosystem of libraries and projects that have been built on top of it, most of which are distributed through PyPI. However in order to benefit from this wide ecosystem meaningfully requires end users, some of which are going to be new, to make a decision on which package manager they should get, how to get it, and then finally actually installing it first. Furthermore, alternative implementations of Python are recognizing the benefits of PEP 453 and both PyPy and Jython have plans to backport ensurepip to their 2.7 runtimes. Automatic Invocation PEP 453 has ``ensurepip`` automatically invoked by default in the ``Makefile`` and the Windows and OSX Installers. This allowed it to ensure that, by default, all users would get Python with pip already installed. This PEP however believes that while this is fine for the Python 2.7 Windows and Mac OS X installers it is *not* ok for the Python 2.7 ``Makefile`` in general. The primary consumers of the ``Makefile`` are downstream package managers which distribute Python themselves. These downstream distributors typically do not want pip to be installed via ``ensurepip`` and would prefer that end users install it with their own package manager. Not invoking ``ensurepip`` automatically from the ``Makefile`` would allow these distributors to simply ignore the fact that ``ensurepip`` has been backported and still not end up with pip installed via it. The primary consumers of the OSX and Windows installers are end users who are attempting to install Python on their own machine. There is not a package manager available where these
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 Sep 2014 07:43, "Christian Heimes" wrote: > > On 31.08.2014 08:09, Nick Coghlan wrote: > > As Antoine says here, I'm also opposed to adding more Python specific > > configuration options. However, I think there may be something > > worthwhile we can do that's closer to the way browsers work, and has > > the significant benefit of being implementable as a PyPI module first > > (more on that in a separate reply). > > I'm on your and Antoine's side and strictly against any additional > environment variables or command line arguments. That would make the > whole validation process even more complex and harder to understand. > > There might be a better option to give people and companies the option > to tune the SSL module to their needs. Python already have a > customization hook for the site module called sitecustomize. How about > another module named sslcustomize? Such a module could be used to tune > the ssl module to the needs of users, e.g. configure a different default > context, add certificates to a default context etc. > > Companies could install them in a system global directory on their > servers. Users could put them in their own user site directory and even > each virtual env can have one sslcustomize of its own. It's fully > backward compatible, doesn't add any flags and developers have the full > power of Python for configuration and customization. And means a user specific store (if one became available) could be configured there. Yes, I think this would address my concerns, especially if combined with a clear recipe in the documentation on how to optionally disable cert validation at the application layer. Assuming sslcustomize was in site-packages rather than the standard library directories, you would also be able to use virtual environments with an appropriate sslcustomize module to disable cert checking even if the application you were running didn't support direct configuration. Cheers, Nick. > > Christian ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 477: selected ensurepip backports for Python 2.7
On 1 Sep 2014 09:23, "Benjamin Peterson" wrote: > > On Sun, Aug 31, 2014, at 16:17, Antoine Pitrou wrote: > > On Mon, 1 Sep 2014 08:00:14 +1000 > > Nick Coghlan wrote: > > > > > > That part of the proposal proved to be controversial, so we dropped it from > > > the original PEP in order to focus on meeting the Python 3.4 specific > > > release deadlines. This also had the benefit of working out the kinks in > > > the bootstrapping processing as part of the Python 3.4 release cycle. > > > > > > However, we still think we should start providing pip by default to Python > > > 2.7 users as well, at least as part of the Windows and Mac OS X installers. > > > > I don't agree with this. pip is simply not part of the 2.7 feature set. > > If you add pip to a bugfix version, then you have bugfix versions which > > are more featureful than others, which makes things more complicated to > > explain. > > 2.7.x has been and will be alive for so long that will already have to > explain that sort thing; i.e. PEP 466 and why different bugfix releases > support different versions of dependency libraries. Exactly. LTS is genuinely different from stopping maintenance after the next feature release - it requires considering the "stability risk" and "user experience improvement" questions separately. In this case, the problem is that the Python 2 platform *is* still evolving, but the centre of that evolution has moved to PyPI. For "standard library only" users, Python 2 stopped evolving back in 2010. For PyPI users, by contrast, it's still evolving at a rapid pace. For our Python 3 transition story to be coherent, we need to ensure tools like six, modernize and future are readily available, while still remaining free to evolve independently of the standard library. That means providing a package management utility as easily and as painlessly as possible. Embracing pip upstream for Python 2 as well as Python 3 also sends a powerful signal to redistributors that says "your users are going to need this" and makes them do additional work to *avoid* providing it. Some of them *will* choose that path, but that becomes a matter for discussion between them and their user base, rather than a problem we need to worry about upstream. Cheers, Nick. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 Sep 2014 08:15, "Donald Stufft" wrote: > > >> On Aug 31, 2014, at 5:43 PM, Christian Heimes wrote: >> >> Companies could install them in a system global directory on their >> servers. Users could put them in their own user site directory and even >> each virtual env can have one sslcustomize of its own. It's fully >> backward compatible, doesn't add any flags and developers have the full >> power of Python for configuration and customization. > > This may be a dumb question, but why can’t sitecustomize do this already? It can. The advantage of a separate file is that it won't conflict with existing sitecustomize modules, so (for example) redistributors can add a default sslcustomize, and you can add one to your virtual environments that are integrated with the system Python environment without needing to worry about whether or not there's a global sitecustomize (you'd only have trouble if there was a global sslcustomize). Cheers, Nick. > > --- > Donald Stufft > PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 September 2014 11:10, R. David Murray wrote: > > It sounds like this would address my concerns as well (I don't really > care *how* it is implemented as long as I don't have to touch the > code of a third party application when I upgrade my python version to > 3.5...remember, the context here is backward compatibility concerns). > > Does it address the issue of accepting an invalid cert, though? That's actually an interesting question, as the PEP doesn't currently propose adding any new global configuration knobs to the ssl or httplib modules - it just proposes switching httplib from the legacy (but fully backwards compatible) ssl._create_stdlib_context() API to the newer (but potentially backwards incompatible in some environments) ssl.create_default_context() API. Having the ssl module import an sslcustomize module at the end wouldn't be enough unless the appropriate APIs were put in place to allow things to be configured at a process global level. One possible way to do that would be to provide a central context factory mapping that provide a module specific SSL context creator. We'd seed it appropriately for the stdlib modules where we wanted to use the legacy context definition, but it would default to using ssl.create_default_context. Under that kind of model, the first change we would actually make is to make ssl._create_stdlib_context() public under a suitable name, let's say ssl.create_legacy_context() Independenting of any other changes, exposing ssl.create_legacy_context() like that would also make it straightforward for folks to opt in to the old behaviour as an interim hack in a way that is easy to grep for and fix later (it's also something a linter can easily disallow). The second change would be to provide a mapping from arbitrary names to context factories in the ssl module that defaults to ssl.create_default_context: named_contexts = defaultdict((lambda name: create_default_context)) (A more accurate name would be "named_context_factory", but I think "named_contexts" reads better. Folks will learn quickly enough that it actually stores context factories rather than prebuilt context objects) The third change would be to replace all calls to "ssl._create_stdlib_context()" with calls to "ssl.named_contexts[__name__]()" instead. The final change would be to seed the context factory map appropriately for the standard library modules where we wanted to keep the *old* default: for modname in ("nntplib", "poplib", "imaplib", "ftplib", "smtplib", "asyncio.selector_events", "urllib.request", "http.client"): named_contexts[modname] = create_legacy_context The list I have above is for *all* current uses of "sss._create_stdlib_context". The backwards incompatible part of PEP 476 would then just be about removing names from that list (currently just "http.client", but I'd suggest "asyncio.selector_events" as another candidate, taking advantage of asyncio's provisional API status). The "revert to 3.4 behaviour" content for sslcustomize.py would then just be: import ssl ssl.named_contexts["http.client"] = ssl.create_legacy_context However, someone that wanted to also enforce SSL properly for other standard library modules could go the other way: import ssl for modname in ("nntplib", "poplib", "imaplib", "ftplib", "smtplib", "urllib.request"): ssl.named_contexts[modname] = ssl.create_default_context Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 September 2014 16:07, Paul Moore wrote: > On 31 August 2014 23:10, Nick Coghlan wrote: >> Assuming sslcustomize was in site-packages rather than the standard library >> directories, you would also be able to use virtual environments with an >> appropriate sslcustomize module to disable cert checking even if the >> application you were running didn't support direct configuration. > > Would this mean that a malicious package could install a custom > sslcustomize.py and so add unwanted certs to the system? I guess we > have to assume that installed packages are trusted, but I just wanted > to be explicit. Yes, it would have exactly the same security failure modes as sitecustomize, except it would only fire if the application imported the ssl module. The "-S" and "-I" switches would need to disable the implied "sslcustomize", just as they disable "import site". Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 1 September 2014 17:13, Christian Heimes wrote: > On 01.09.2014 08:44, Nick Coghlan wrote: >> Yes, it would have exactly the same security failure modes as >> sitecustomize, except it would only fire if the application >> imported the ssl module. >> >> The "-S" and "-I" switches would need to disable the implied >> "sslcustomize", just as they disable "import site". > > A malicious package can already play havoc with your installation with > a custom ssl module. If somebody is able to sneak in a ssl.py then you > are screwed anyway. sslcustomize is not going to make the situation worse. That's not quite true - we're fairly careful about putting the standard library before userspace directories, so aside from the "current directory" problem, shadowing "ssl" itself can be tricky to arrange. "sslcustomize" would be more like "sitecustomize" - since it wouldn't normally be in the standard library, it can appear anywhere on sys.path, rather than having to be injected ahead of the standard library. I think that's OK though - compared to the security nightmare that is downloading modules from PyPI and running "./setup.py install" (or, even worse, "sudo ./setup.py install"), this would be a rather esoteric attack vector, and the existing -S and -I mechanisms could be used to defend against it :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 477: selected ensurepip backports for Python 2.7
On 1 Sep 2014 17:31, "Donald Stufft" wrote: > > >> On Sep 1, 2014, at 2:22 AM, Ned Deily wrote: >> >> >> And that is a minor complication compared with the confusion and >> difficulty of trying to explain to users (stuck with 2.7 for the time >> being) of how to install third-party packages on each platform >> (especially Windows) versus the simplicity of the 3.4.x story, thanks to >> ensurepip. Having pip available as a documented, batteries-included >> tool for all current releases would be a huge usability improvement. > > > Yes this is a major driver. I mean I think I probably have an above average > knowledge of how to bootstrap pip, and if you dump me on a Windows box > I struggle to actually do it the first time around without stumbling around and > doing things in the wrong order and the like. (Getting a compiler toolchain is > worse, but yay for Wheels). Yeah. I've mentioned it before, but I think it bears repeating that trying to install pip on Windows with both Python 2 & 3 installed was one of the key things that convinced me to write PEP 453 in the first place. The default settings in both Internet and Windows explorer make it tricky regardless, but parallel installs make it even worse. >> FTR, I'm willing to backport the pieces I did for 3.4 and I could do the >> ensurepip backport, as well. I'll leave the Windows installer changes >> for someone else, though. > > > Awesome, I’m of course willing to back port ensure pip itself as well. Truthfully > it shouldn’t be a very difficult backport. It’s only ~200 SLOC or so and the only > real things would be removing a Python3ism here or there. Backporting meaningful tests will actually be the annoying part: the current unit tests use unittest.mock, while the current functional tests use pyvenv :) Both of those can be dealt with, but the tests will be a bit of an ugly hack by comparison with their Py3 counterparts :) Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 2 Sep 2014 00:08, "Antoine Pitrou" wrote: > > On Mon, 1 Sep 2014 23:42:10 +1000 > Chris Angelico wrote: > > >> > > >> That has to be done inside the same process. But imagine this > > >> scenario: You have a program that gets invoked as root (or some other > > >> user than yourself), and you're trying to fiddle with what it sees. > > >> You don't have root access, but you can manipulate the file system, to > > >> the extent that your userid has access. What can you do to affect this > > >> other program? > > > > > > If you're root you shouldn't run untrusted code. See > > > https://docs.python.org/3/using/cmdline.html#cmdoption-I > > > > Right, which is why sslcustomize has to be controlled by that, but the > > possibility of patching (or monkeypatching) ssl.py isn't as big a > > deal. > > To be frank I don't understand what you're arguing about. When I said "shadowing ssl can be tricky to arrange", Chris correctly interpreted it as referring to the filesystem based privilege escalation scenario that isolated mode handles, not to normal in-process monkeypatching or module injection. I don't consider the latter cases to be interesting attack scenarios, as they imply the attacker is *already* running arbitrary Python code inside your CPython process, so you've already lost. Cheers, Nick. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 2 Sep 2014 00:59, "Antoine Pitrou" wrote: > > On Tue, 2 Sep 2014 00:53:11 +1000 > Nick Coghlan wrote: > > > > > > To be frank I don't understand what you're arguing about. > > > > When I said "shadowing ssl can be tricky to arrange", Chris correctly > > interpreted it as referring to the filesystem based privilege escalation > > scenario that isolated mode handles, not to normal in-process > > monkeypatching or module injection. > > There's no actual difference. You can have a sitecustomize.py that does > the monkeypatching or the shadowing. There doesn't seem to be anything > "tricky" about that. Oh, now I get what you mean - yes, sitecustomize already poses the same kind of problem as the proposed sslcustomize (hence the existence of the related command line options). I missed that you had switched to talking about using that attack vector, rather than trying to shadow stdlib modules directly through the filesystem (which is the only tricky thing I was referring to). Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On 2 Sep 2014 03:08, "Donald Stufft" wrote: > > >> On Sep 1, 2014, at 1:01 PM, Christian Heimes wrote: >> >> On 01.09.2014 17:35, Nick Coghlan wrote: >>> >>> Oh, now I get what you mean - yes, sitecustomize already poses the same >>> kind of problem as the proposed sslcustomize (hence the existence of the >>> related command line options). >> >> >> If an attacker is able to place a module like sitecustomize.py in an >> import directory or any .pth file in a site-packages directory than this >> Python installation is compromised. .pth files are insidious because >> they are always loaded and their code is always executed. I don't see >> how sslcustomize is going to make a difference here. >> > > Right, this is the point I was trying to make. If you’ve installed a malicious > package it’s game over. There’s nothing Python can do to help you. Yes, that's what I said originally when pointing out that isolated mode and the switch to disable site module processing would need to disable sslcustomize processing as well. Antoine was replying to a side comment about it being tricky to shadow stdlib modules. I left out the qualifier "directly" in my original comment, and he left out "indirectly through sitecustomize" in his initial reply, so we were talking past each for a while. Cheers, Nick. > > --- > Donald Stufft > PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
On 2 September 2014 07:17, Matthew Woodcraft wrote: > > (The program handles SIGTERM so that it can do a bit of cleanup before > exiting, and it uses the signal-handler-sets-a-flag technique. The call > that might be interrupted is sleep(), so the program doesn't strictly > _rely_ on the existing behaviour; it would just become very slow to > exit.) Making an exception for sleep() (i.e. still letting it throw EINTR) sounds like a reasonable idea to me. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com