Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
26.06.14 02:28, Nick Coghlan написав(ла): OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Do you want to make some patch reviews? ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
Le 25/06/2014 19:28, Nick Coghlan a écrit : OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Hmmm... From my perspective, trying to enforce unicode-disabled builds will only lower the (already low) chance that I may want to write / backport bug fixes for 2.7. For the same reason, I agree with Victor that we should ditch the threading-disabled builds. It's too much of a hassle for no actual, practical benefit. People who want a threadless unicodeless Python can install Python 1.5.2 for all I care. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou wrote: > For the same reason, I agree with Victor that we should ditch the > threading-disabled builds. It's too much of a hassle for no actual, > practical benefit. People who want a threadless unicodeless Python can > install Python 1.5.2 for all I care. Or some other implementation of Python. It's looking like micropython will be permanently supporting a non-Unicode build (although I stepped away from the project after a strong disagreement over what would and would not make sense, and haven't been following it since). If someone wants a Python that doesn't have stuff that the core CPython devs treat as essential, s/he probably wants something like uPy anyway. ChrisA ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented
On Wed, Jun 25, 2014, at 23:38, Ned Deily wrote: > In article <[email protected]>, Jesus Cea wrote: > > > On 25/06/14 20:35, Ned Deily wrote: > > > The 3.3 branch is open only to security fixes. Please don't backport > > > other patches to there. > > > > > > https://docs.python.org/devguide/devcycle.html#summary > > > > Ned, I am aware. It is a doc-only fix, like fixing a typo or correcting > > an incorrect statement. It that is against policy, let me know. > > My understanding is that doc changes are treated the same as any other > code changes. As you noticed, after a release leaves maintenance mode, > its documentation is no longer updated on the web site. To echo Ned, committing a doc change to 3.3 isn't the end of the world. We just want to make sure energy is focused on the 3 branches we do fully maintain. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] C version of functools.lru_cache
Hello python devs, I was recently in need of some faster caching and thought this would be a good opportunity to familiarize myself with the Python/C api so I wrote a C extension for the lru_cache in functools. The source is at https://github.com/pbrady/fastcache.git and I've posted it as a package on PyPI (fastcache). There are some simple benchmarks on the github page showing about 9x speedup. I would like to submit this for incorporation into the standard library. Is there any interest in this? I suspect it probably requires some changes/cleanup especially since I haven't addressed thread-safety at all. Thanks, Peter. P.S. This was the motivation for the faster caching https://github.com/sympy/sympy/pull/7464. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C version of functools.lru_cache
You might look at https://bugs.python.org/issue14373 On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote: > Hello python devs, > > I was recently in need of some faster caching and thought this would be a > good opportunity to familiarize myself with the Python/C api so I wrote a > C > extension for the lru_cache in functools. The source is at > https://github.com/pbrady/fastcache.git and I've posted it as a package > on > PyPI (fastcache). There are some simple benchmarks on the github page > showing about 9x speedup. I would like to submit this for incorporation > into the standard library. Is there any interest in this? I suspect it > probably requires some changes/cleanup especially since I haven't > addressed > thread-safety at all. > > Thanks, > Peter. > > P.S. This was the motivation for the faster caching > https://github.com/sympy/sympy/pull/7464. > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C version of functools.lru_cache
Looks like it's already in the works! Nevermind On Thu, Jun 26, 2014 at 10:33 AM, Benjamin Peterson wrote: > You might look at https://bugs.python.org/issue14373 > > On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote: > > Hello python devs, > > > > I was recently in need of some faster caching and thought this would be a > > good opportunity to familiarize myself with the Python/C api so I wrote a > > C > > extension for the lru_cache in functools. The source is at > > https://github.com/pbrady/fastcache.git and I've posted it as a package > > on > > PyPI (fastcache). There are some simple benchmarks on the github page > > showing about 9x speedup. I would like to submit this for incorporation > > into the standard library. Is there any interest in this? I suspect it > > probably requires some changes/cleanup especially since I haven't > > addressed > > thread-safety at all. > > > > Thanks, > > Peter. > > > > P.S. This was the motivation for the faster caching > > https://github.com/sympy/sympy/pull/7464. > > ___ > > Python-Dev mailing list > > [email protected] > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Binary CPython distribution for Linux
I'm an advocate of getting users and projects to move to modern Python versions. I believe dropping support for end-of-lifed Python versions is important for the health of the Python community. If you've done any amount of Python 3 porting work, you know things get much harder the more 2.x legacy versions you need to support. I led the successful charge to drop support for Python 2.6 and below from Firefox's build system. I failed to win the argument that Mercurial should drop 2.4 and 2.5 [1]. A few years ago, I started a similar conversation with the LLVM project [2]. I wrote a blog post on the subject [3] that even got Slashdotted [4] (although I don't think that's the honor it was a decade ago). While much of the opposition to dropping Python <2.7 stems from the RHEL community (they still have 2.4 in extended support and 2.7 wasn't in a release until a few weeks ago), a common objection from the users is "I can't install a different Python" or "it's too difficult to install a different Python." The former is a legit complaint - if you are on shared hosting and don't have root, as easy as it is to add an alternate package repository that provides 2.7 (or newer), you don't have the permissions so you can't do it. This leaves users with attempting a userland install of Python. Personally, I think installing Python in userland is relatively simple. Tools like pyenv make this turnkey. Worst case you fall back to configure + make. But I'm an experienced developer and have a compiler toolchain and library dependencies on my machine. What about less experienced users or people that don't have the necessary build dependencies? And, even if they do manage to find or build a Python distribution, we all know that there's enough finicky behavior with things like site-packages default paths to cause many headaches, even for experienced Python hackers. I'd like to propose a solution to this problem: a pre-built distribution of CPython for Linux available via www.python.org in the list of downloads for a particular release [5]. This distribution could be downloaded and unarchived into the user's home directory and users could start running it immediately by setting an environment variable or two, creating a symlink, or even running a basic installer script. This would hopefully remove the hurdles of obtaining a (sane) Python distribution on Linux. This would allow projects to more easily drop end-of-life Python versions and would speed adoption of modern Python, including Python 3 (because porting is much easier if you only have to target 2.7). I understand there may be technical challenges with doing this for some distributions and with producing a universal binary distribution. I would settle for a binary distribution that was targeted towards RHEL users and variant distros, as that is the user population that I perceive to be the most conservative and responsible for holding modern Python adoption back. [1] http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902 [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html [3] http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/ [4] http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases [5] https://www.python.org/download/releases/2.7.7/ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Binary CPython distribution for Linux
Le 26/06/2014 20:34, Gregory Szorc a écrit : I'm an advocate of getting users and projects to move to modern Python versions. I believe dropping support for end-of-lifed Python versions is important for the health of the Python community. If you've done any amount of Python 3 porting work, you know things get much harder the more 2.x legacy versions you need to support. I led the successful charge to drop support for Python 2.6 and below from Firefox's build system. I failed to win the argument that Mercurial should drop 2.4 and 2.5 [1]. A few years ago, I started a similar conversation with the LLVM project [2]. I wrote a blog post on the subject [3] that even got Slashdotted [4] (although I don't think that's the honor it was a decade ago). While much of the opposition to dropping Python <2.7 stems from the RHEL community (they still have 2.4 in extended support and 2.7 wasn't in a release until a few weeks ago), a common objection from the users is "I can't install a different Python" or "it's too difficult to install a different Python." The former is a legit complaint - if you are on shared hosting and don't have root, as easy as it is to add an alternate package repository that provides 2.7 (or newer), you don't have the permissions so you can't do it. This leaves users with attempting a userland install of Python. Personally, I think installing Python in userland is relatively simple. Tools like pyenv make this turnkey. Worst case you fall back to configure + make. But I'm an experienced developer and have a compiler toolchain and library dependencies on my machine. What about less experienced users or people that don't have the necessary build dependencies? And, even if they do manage to find or build a Python distribution, we all know that there's enough finicky behavior with things like site-packages default paths to cause many headaches, even for experienced Python hackers. I'd like to propose a solution to this problem: a pre-built distribution of CPython for Linux available via www.python.org in the list of downloads for a particular release [5]. This distribution could be downloaded and unarchived into the user's home directory and users could start running it immediately by setting an environment variable or two, creating a symlink, or even running a basic installer script. This would hopefully remove the hurdles of obtaining a (sane) Python distribution on Linux. This would allow projects to more easily drop end-of-life Python versions and would speed adoption of modern Python, including Python 3 (because porting is much easier if you only have to target 2.7). I understand there may be technical challenges with doing this for some distributions and with producing a universal binary distribution. I would settle for a binary distribution that was targeted towards RHEL users and variant distros, as that is the user population that I perceive to be the most conservative and responsible for holding modern Python adoption back. [1] http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902 [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html [3] http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/ [4] http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases [5] https://www.python.org/download/releases/2.7.7/ Just today I installed Anaconda (https://store.continuum.io/cshop/anaconda/) on Linux servers running CentOS 6.4. It installs in a directory anywhere in the filesystem (no need to be root), and using it globally is just a matter of prepending a folder to the PATH and it was done. Of course Anaconda is oriented towards scientific applications but it is a proof that a pre-build binary installer works and can be simple to use. If someone wants to try it without all scientific libraries they provide Miniconda (http://conda.pydata.org/miniconda.html) which contains only python and the python package manager conda. Joseph ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Binary CPython distribution for Linux
I have a little pet project for building rpm of python 2.7 (it should be trivial to port to 3.x): https://build.opensuse.org/project/show/home:cavallo71:opt-python-modules If there's enough interest I can help to integrate with python.org. >> I understand there may be technical challenges with doing this for some >> distributions and with producing a universal binary distribution. Opensuse provides the vm to build binaries for multiple platforms already since a very long time. > Of course Anaconda is oriented towards scientific applications but it is > a proof that a pre-build binary installer works and can be simple to use. Rpm are the "blessed" way to instal software on linux: it supports what most sysadmin expect (easy to list the installed packages, easy to validate if tampering to a package occurred, which file belongs to a package? etc..). Anaconda might appeal some group of user, but for deployment company-wide rpm is the best technical solution given its integration in linux. I hope this helps, Antonio ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Binary CPython distribution for Linux
Le 26/06/2014 22:00, Antonio Cavallo a écrit : > Of course Anaconda is oriented towards scientific applications but it is > a proof that a pre-build binary installer works and can be simple to use. Rpm are the "blessed" way to instal software on linux: it supports what most sysadmin expect (easy to list the installed packages, easy to validate if tampering to a package occurred, which file belongs to a package? etc..). Anaconda might appeal some group of user, but for deployment company-wide rpm is the best technical solution given its integration in linux. 1. Not all Linux distros use rpm (Debian, Ubuntu, Arch Linux...) 2. rpm need to be root to be installed. Btw, Anaconda is multiplatform and can be installed on Linux, Windows and Mac. Joseph ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hi Python dev folks, I've written a PEP proposing a specific os.scandir() API for a directory iterator that returns the stat-like info from the OS, the main advantage of which is to speed up os.walk() and similar operations between 4-20x, depending on your OS and file system. Full details, background info, and context links are in the PEP, which Victor Stinner has uploaded at the following URL, and I've also copied inline below. http://legacy.python.org/dev/peps/pep-0471/ Would love feedback on the PEP, but also of course on the proposal itself. -Ben PEP: 471 Title: os.scandir() function -- a better and faster directory iterator Version: $Revision$ Last-Modified: $Date$ Author: Ben Hoyt Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-May-2014 Python-Version: 3.5 Abstract This PEP proposes including a new directory iteration function, ``os.scandir()``, in the standard library. This new function adds useful functionality and increases the speed of ``os.walk()`` by 2-10 times (depending on the platform and file system) by significantly reducing the number of times ``stat()`` needs to be called. Rationale = Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.) In practice, removing all those extra system calls makes ``os.walk()`` about **8-9 times as fast on Windows**, and about **2-3 times as fast on Linux and Mac OS X**. So we're not talking about micro- optimizations. See more `benchmarks`_. .. _`benchmarks`: https://github.com/benhoyt/scandir#benchmarks Somewhat relatedly, many people (see Python `Issue 11406`_) are also keen on a version of ``os.listdir()`` that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories. So as well as providing a ``scandir()`` iterator function for calling directly, Python's existing ``os.walk()`` function could be sped up a huge amount. .. _`Issue 11406`: http://bugs.python.org/issue11406 Implementation == The implementation of this proposal was written by Ben Hoyt (initial version) and Tim Golden (who helped a lot with the C extension module). It lives on GitHub at `benhoyt/scandir`_. .. _`benhoyt/scandir`: https://github.com/benhoyt/scandir Note that this module has been used and tested (see "Use in the wild" section in this PEP), so it's more than a proof-of-concept. However, it is marked as beta software and is not extensively battle-tested. It will need some cleanup and more thorough testing before going into the standard library, as well as integration into `posixmodule.c`. Specifics of proposal = Specifically, this PEP proposes adding a single function to the ``os`` module in the standard library, ``scandir``, that takes a single, optional string as its argument:: scandir(path='.') -> generator of DirEntry objects Like ``listdir``, ``scandir`` calls the operating system's directory iteration system calls to get the names of the files in the ``path`` directory, but it's different from ``listdir`` in two ways: * Instead of bare filename strings, it returns lightweight ``DirEntry`` objects that hold the filename string and provide simple methods that allow access to the stat-like data the operating system returned. * It returns a generator instead of a list, so that ``scandir`` acts as a true iterator instead of returning the full list immediately. ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to ``path`` (corresponds to the return values of ``os.listdir``) * ``is_dir()``: like ``os.path.isdir()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_file()``: like ``os.path.isfile()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_symlink()``: like ``os.path.islink()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``lstat()``: like ``os.lstat()``, but requires no system calls on Windows The ``DirEntry
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 2014-06-26 23:59, Ben Hoyt wrote: Hi Python dev folks, I've written a PEP proposing a specific os.scandir() API for a directory iterator that returns the stat-like info from the OS, the main advantage of which is to speed up os.walk() and similar operations between 4-20x, depending on your OS and file system. Full details, background info, and context links are in the PEP, which Victor Stinner has uploaded at the following URL, and I've also copied inline below. http://legacy.python.org/dev/peps/pep-0471/ Would love feedback on the PEP, but also of course on the proposal itself. [snip] Personally, I'd prefer the name 'iterdir' because it emphasises that it's an iterator. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 27 June 2014 09:28, MRAB wrote: > Personally, I'd prefer the name 'iterdir' because it emphasises that > it's an iterator. Exactly what I was going to post (with the added note that thee's an obvious symmetry with listdir). +1 for iterdir rather than scandir Other than that: +1 for adding scandir to the stdlib -1 for windows_wildcard (it would be an attractive nuisance to write windows-only code) Tim Delaney ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hello, On Thu, 26 Jun 2014 18:59:45 -0400 Ben Hoyt wrote: > Hi Python dev folks, > > I've written a PEP proposing a specific os.scandir() API for a > directory iterator that returns the stat-like info from the OS, the > main advantage of which is to speed up os.walk() and similar > operations between 4-20x, depending on your OS and file system. Full > details, background info, and context links are in the PEP, which > Victor Stinner has uploaded at the following URL, and I've also copied > inline below. I noticed obvious inefficiency of os.walk() implemented in terms of os.listdir() when I worked on "os" module for MicroPython. I essentially did what your PEP suggests - introduced internal generator function (ilistdir_ex() in https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85 ), in terms of which both os.listdir() and os.walk() are implemented. With my MicroPython hat on, os.scandir() would make things only worse. With current interface, one can either have inefficient implementation (like CPython chose) or efficient implementation (like MicroPython chose) - all transparently. os.scandir() supposedly opens up efficient implementation for everyone, but at the price of bloating API and introducing heavy-weight objects to wrap info. PEP calls it "lightweight DirEntry objects", but that cannot be true, because all Python objects are heavy-weight, especially those which have methods. It would be better if os.scandir() was specified to return a struct (named tuple) compatible with return value of os.stat() (with only fields relevant to underlying readdir()-like system call). The grounds for that are obvious: it's already existing data interface in module "os", which is also based on open standard for operating systems - POSIX, so if one is to expect something about file attributes, it's what one can reasonably base expectations on. But reusing os.stat struct is glaringly not what's proposed. And it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(), but requires no system calls on Windows". Nice, but OS "FooBar" can do much more than Windows - it has a system call to send a file by email, right when scanning a directory containing it. So, why not to have DirEntry.send_by_email(recipient) method? I hear the answer - it's because CPython strives to support Windows well, while doesn't care about "FooBar" OS. And then it again leads to the question I posed several times - where's line between "CPython" and "Python"? Is it grounded for CPython to add (or remove) to Python stdlib something which is useful for its users, but useless or complicating for other Python implementations? Especially taking into account that there's "win32api" module allowing Windows users to use all wonders of its API? Especially that os.stat struct is itself pretty extensible (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix systems (such as FreeBSD), the following attributes may be available ...", "On Mac OS systems...", - so extra fields can be added for Windows just the same, if really needed). > > http://legacy.python.org/dev/peps/pep-0471/ > > Would love feedback on the PEP, but also of course on the proposal > itself. > > -Ben > [] -- Best regards, Paul mailto:[email protected] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 06/26/2014 04:36 PM, Tim Delaney wrote: On 27 June 2014 09:28, MRAB wrote: Personally, I'd prefer the name 'iterdir' because it emphasises that it's an iterator. Exactly what I was going to post (with the added note that thee's an obvious symmetry with listdir). +1 for iterdir rather than scandir Other than that: +1 for adding [it] to the stdlib +1 for all of above -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote: > > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. PEP calls it > "lightweight DirEntry objects", but that cannot be true, because all > Python objects are heavy-weight, especially those which have methods. Why do you think methods make an object more heavyweight? namedtuples have methods. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hello, On Thu, 26 Jun 2014 17:35:21 -0700 Benjamin Peterson wrote: > On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote: > > > > With my MicroPython hat on, os.scandir() would make things only > > worse. With current interface, one can either have inefficient > > implementation (like CPython chose) or efficient implementation > > (like MicroPython chose) - all transparently. os.scandir() > > supposedly opens up efficient implementation for everyone, but at > > the price of bloating API and introducing heavy-weight objects to > > wrap info. PEP calls it "lightweight DirEntry objects", but that > > cannot be true, because all Python objects are heavy-weight, > > especially those which have methods. > > Why do you think methods make an object more heavyweight? Because you need to call them. And if the only thing they do is return object field, call overhead is rather noticeable. > namedtuples have methods. Yes, unfortunately. But fortunately, named tuple is a subclass of tuple, so user caring for efficiency can just use numeric indexing which existed for os.stat values all the time, blissfully ignoring cruft which have been accumulating there since 1.5 times. -- Best regards, Paul mailto:[email protected] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
+1 for scandir. -1 for iterdir(scandir sounds fancier). - for windows_wildcard. Tim Delaney wrote: >On 27 June 2014 09:28, MRAB wrote: > >> Personally, I'd prefer the name 'iterdir' because it emphasises that >> it's an iterator. > > >Exactly what I was going to post (with the added note that thee's an >obvious symmetry with listdir). > >+1 for iterdir rather than scandir > >Other than that: > >+1 for adding scandir to the stdlib >-1 for windows_wildcard (it would be an attractive nuisance to write >windows-only code) > >Tim Delaney > > > > >___ >Python-Dev mailing list >[email protected] >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
I don't mind iterdir() and would take it :-), but I'll just say why I chose the name scandir() -- though it wasn't my suggestion originally: iterdir() sounds like just an iterator version of listdir(), kinda like keys() and iterkeys() in Python 2. Whereas in actual fact the return values are quite different (DirEntry objects vs strings), and so the name change reflects that difference a little. I'm also -1 on windows_wildcard. I think it's asking for trouble, and wouldn't gain much on Windows in most cases anyway. -Ben On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman wrote: > On 06/26/2014 04:36 PM, Tim Delaney wrote: >> >> On 27 June 2014 09:28, MRAB wrote: >>> >>> >>> Personally, I'd prefer the name 'iterdir' because it emphasises that >>> it's an iterator. >> >> >> Exactly what I was going to post (with the added note that thee's an >> obvious symmetry with listdir). >> >> +1 for iterdir rather than scandir >> >> Other than that: >> >> +1 for adding [it] to the stdlib > > > +1 for all of above > > -- > ~Ethan~ > > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 2014-06-27 02:37, Ben Hoyt wrote: I don't mind iterdir() and would take it :-), but I'll just say why I chose the name scandir() -- though it wasn't my suggestion originally: iterdir() sounds like just an iterator version of listdir(), kinda like keys() and iterkeys() in Python 2. Whereas in actual fact the return values are quite different (DirEntry objects vs strings), and so the name change reflects that difference a little. [snip] The re module has 'findall', which returns a list of strings, and 'finditer', which returns an iterator that yields match objects, so there's a precedent. :-) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
> os.listdir() when I worked on "os" module for MicroPython. I essentially > did what your PEP suggests - introduced internal generator function > (ilistdir_ex() in > https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85 > ), in terms of which both os.listdir() and os.walk() are implemented. Nice (though I see the implementation is very *nix specific). > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. PEP calls it > "lightweight DirEntry objects", but that cannot be true, because all > Python objects are heavy-weight, especially those which have methods. It's a fair point that os.walk() can be implemented efficiently without adding a new function and API. However, often you'll want more info, like the file size, which scandir() can give you via DirEntry.lstat(), which is free on Windows. So opening up this efficient API is beneficial. In CPython, I think the DirEntry objects are as lightweight as stat_result objects. I'm an embedded developer by background, so I know the constraints here, but I really don't think Python's development should be tailored to fit MicroPython. If os.scandir() is not very efficient on MicroPython, so be it -- 99% of all desktop/server users will gain from it. > It would be better if os.scandir() was specified to return a struct > (named tuple) compatible with return value of os.stat() (with only > fields relevant to underlying readdir()-like system call). The grounds > for that are obvious: it's already existing data interface in module > "os", which is also based on open standard for operating systems - > POSIX, so if one is to expect something about file attributes, it's > what one can reasonably base expectations on. Yes, we considered this early on (see the python-ideas and python-dev threads referenced in the PEP), but decided it wasn't a great API to overload stat_result further, and have most of the attributes None or not present on Linux. > Especially that os.stat struct is itself pretty extensible > (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix > systems (such as FreeBSD), the following attributes may be > available ...", "On Mac OS systems...", - so extra fields can be added > for Windows just the same, if really needed). Yes. Incidentally, I just submitted an (accepted) patch for Python 3.5 that adds the full Win32 file attribute data to stat_result objects on Windows (see https://docs.python.org/3.5/whatsnew/3.5.html#os). However, for scandir() to be useful, you also need the name. My original version of this directory iterator returned two-tuples of (name, stat_result). But most people didn't like the API, and I don't really either. You could overload stat_result with a .name attribute in this case, but it still isn't a nice API to have most of the attributes None, and then you have to test for that, etc. So basically we tweaked the API to do what was best, and ended up with it returning DirEntry objects with is_file() and similar methods. Hope that helps give a bit more context. If you haven't read the relevant python-ideas and python-dev threads, those are interesting too. -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
+1 on getting this in for 3.5. If the only objection people are having is the stupid paint color of the name I don't care what it's called! scandir matches the libc API of the same name. iterdir also makes sense to anyone reading it. Whoever checks this in can pick one and be done with it. We have other Python APIs with iter in the name and tend not to be trying to mirror C so much these days so the iterdir folks do have a valid point. I'm not a huge fan of the DirEntry object and the method calls on it instead of simply yielding tuples of (filename, partially_filled_in_stat_result) but I don't *really* care which is used as they both work fine and it is trivial to wrap with another generator expression to turn it into exactly what you want anyways. Python not having the ability to operate on large directories means Python simply cannot be used for common system maintenance tasks. Python being slow to walk a file system due to unnecessary stat calls (often each an entire io op. requiring a disk seek!) due to the existing information that it throws away not being used via listdir is similarly a problem. This addresses both. IMNSHO, it is a single function, it belongs in the os module right next to listdir. -gps On Thu, Jun 26, 2014 at 6:37 PM, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. > > I'm also -1 on windows_wildcard. I think it's asking for trouble, and > wouldn't gain much on Windows in most cases anyway. > > -Ben > > On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman wrote: > > On 06/26/2014 04:36 PM, Tim Delaney wrote: > >> > >> On 27 June 2014 09:28, MRAB wrote: > >>> > >>> > >>> Personally, I'd prefer the name 'iterdir' because it emphasises that > >>> it's an iterator. > >> > >> > >> Exactly what I was going to post (with the added note that thee's an > >> obvious symmetry with listdir). > >> > >> +1 for iterdir rather than scandir > >> > >> Other than that: > >> > >> +1 for adding [it] to the stdlib > > > > > > +1 for all of above > > > > -- > > ~Ethan~ > > > > ___ > > Python-Dev mailing list > > [email protected] > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote: > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. os.scandir is not part of the Python API, it is not a built-in function. It is part of the CPython standard library. That means (in my opinion) that there is an expectation that other Pythons should provide it, but not an absolute requirement. Especially for the os module, which by definition is platform-specific. In my opinion that means you have four options: 1. provide os.scandir, with exactly the same semantics as on CPython; 2. provide os.scandir, but change its semantics to be more lightweight (e.g. return an ordinary tuple, as you already suggest); 3. don't provide os.scandir at all; or 4. do something different depending on whether the platform is Linux or an embedded system. I would consider any of those acceptable for a library feature, but not for a language feature. [...] > But reusing os.stat struct is glaringly not what's proposed. And > it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(), > but requires no system calls on Windows". Nice, but OS "FooBar" can do > much more than Windows - it has a system call to send a file by email, > right when scanning a directory containing it. So, why not to have > DirEntry.send_by_email(recipient) method? I hear the answer - it's > because CPython strives to support Windows well, while doesn't care > about "FooBar" OS. Correct. If there is sufficient demand for FooBar, then CPython may support it. Until then, FooBarPython can support it, and offer whatever platform-specific features are needed within its standard library. > And then it again leads to the question I posed several times - where's > line between "CPython" and "Python"? Is it grounded for CPython to add > (or remove) to Python stdlib something which is useful for its users, > but useless or complicating for other Python implementations? I think so. And other implementations are free to do the same thing. Of course there is an expectation that the standard library of most implementations will be broadly similar, but not that they will be identical. I am surprised that both Jython and IronPython offer an non-functioning dis module: you can import it successfully, but if there's a way to actually use it, I haven't found it: steve@orac:~$ jython Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) [OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): File "", line 1, in File "/usr/share/jython/Lib/dis.py", line 42, in dis disassemble(x) File "/usr/share/jython/Lib/dis.py", line 64, in disassemble linestarts = dict(findlinestarts(co)) File "/usr/share/jython/Lib/dis.py", line 183, in findlinestarts byte_increments = [ord(c) for c in code.co_lnotab[0::2]] AttributeError: 'tablecode' object has no attribute 'co_lnotab' IronPython gives a different exception: steve@orac:~$ ipy IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1433 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): TypeError: don't know how to disassemble code objects It's quite annoying, I would have rather that they just removed the module altogether. Better still would have been to disassemble code objects to whatever byte code the Java and .Net platforms use. But there's surely no requirement to disassemble to CPython byte code! -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Thu, Jun 26, 2014 at 09:37:50PM -0400, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. +1 I think that's a good objective reason to prefer scandir, which suits me, because my subjective opinion is that "iterdir" is an inelegant and less than attractive name. -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
I'm generally +1, with opinions noted below on these two topics. On 6/26/2014 3:59 PM, Ben Hoyt wrote: Should there be a way to access the full path? -- Should ``DirEntry``'s have a way to get the full path without using ``os.path.join(path, entry.name)``? This is a pretty common pattern, and it may be useful to add pathlib-like ``str(entry)`` functionality. This functionality has also been requested in `issue 13`_ on GitHub. .. _`issue 13`:https://github.com/benhoyt/scandir/issues/13 +1 Should it expose Windows wildcard functionality? Should ``scandir()`` have a way of exposing the wildcard functionality in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The scandir module on GitHub exposes this as a ``windows_wildcard`` keyword argument, allowing Windows power users the option to pass a custom wildcard to ``FindFirstFile``, which may avoid the need to use ``fnmatch`` or similar on the resulting names. It is named the unwieldly ``windows_wildcard`` to remind you you're writing power- user, Windows-only code if you use it. This boils down to whether ``scandir`` should be about exposing all of the system's directory iteration features, or simply providing a fast, simple, cross-platform directory iteration API. This PEP's author votes for not including ``windows_wildcard`` in the standard library version, because even though it could be useful in rare cases (say the Windows Dropbox client?), it'd be too easy to use it just because you're a Windows developer, and create code that is not cross-platform. Because another common pattern is to check for name matches pattern, I think it would be good to have a feature that provides such. I do that in my own private directory listing extensions, and also some command lines expose it to the user. Where exposed to the user, I use -p windows-pattern and -P regexp. My implementation converts the windows-pattern to a regexp, and then uses common code, but for this particular API, because the windows_wildcard can be optimized by the window API call used, it would make more sense to pass windows_wildcard directly to FindFirst on Windows, but on *nix convert it to a regexp. Both Windows and *nix would call re to process pattern matches except for the case on Windows of having a Windows pattern passed in. The alternate parameter could simply be called wildcard, and would be a regexp. If desired, other flavors of wildcard bsd_wildcard? could also be implemented, but I'm not sure there are any benefits to them, as there are, as far as I am aware, no optimizations for those patterns in those systems. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 26 June 2014 23:59, Ben Hoyt wrote: > Would love feedback on the PEP, but also of course on the proposal itself. A solid +1 from me. Some specific points: - I'm in favour of it being in the os module. It's more discoverable there, as well as the other reasons mentioned. - I prefer scandir as the name, for the reason you gave (the output isn't the same as an iterator version of listdir) - I'm mildly against windows_wildcard (even though I'm a windows user) - You mention the caching behaviour of DirEntry objects. The limitations should be clearly covered in the final docs, as it's the sort of thing people will get wrong otherwise. Paul ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
