date:20140626

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-26 Thread Serhiy Storchaka


26.06.14 02:28, Nick Coghlan написав(ла):

OK, *that* sounds like an excellent reason to keep the Unicode disabled
builds functional, and make sure they stay that way with a buildbot: to
help make sure we're not accidentally running afoul of the implicit
interoperability between str and unicode when backporting fixes from
Python 3.

Helping to ensure correct handling of str values makes this capability
something of benefit to *all* Python 2 users, not just those that turn
off the Unicode support. It also makes it a potentially useful testing
tool when assessing str/unicode handling in general.


Do you want to make some patch reviews?


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-26 Thread Antoine Pitrou


Le 25/06/2014 19:28, Nick Coghlan a écrit :


OK, *that* sounds like an excellent reason to keep the Unicode disabled
builds functional, and make sure they stay that way with a buildbot: to
help make sure we're not accidentally running afoul of the implicit
interoperability between str and unicode when backporting fixes from
Python 3.

Helping to ensure correct handling of str values makes this capability
something of benefit to *all* Python 2 users, not just those that turn
off the Unicode support. It also makes it a potentially useful testing
tool when assessing str/unicode handling in general.


Hmmm... From my perspective, trying to enforce unicode-disabled builds 
will only lower the (already low) chance that I may want to write / 
backport bug fixes for 2.7.


For the same reason, I agree with Victor that we should ditch the 
threading-disabled builds. It's too much of a hassle for no actual, 
practical benefit. People who want a threadless unicodeless Python can 
install Python 1.5.2 for all I care.


Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-26 Thread Chris Angelico

On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou  wrote:
> For the same reason, I agree with Victor that we should ditch the
> threading-disabled builds. It's too much of a hassle for no actual,
> practical benefit. People who want a threadless unicodeless Python can
> install Python 1.5.2 for all I care.

Or some other implementation of Python. It's looking like micropython
will be permanently supporting a non-Unicode build (although I stepped
away from the project after a strong disagreement over what would and
would not make sense, and haven't been following it since). If someone
wants a Python that doesn't have stuff that the core CPython devs
treat as essential, s/he probably wants something like uPy anyway.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented

2014-06-26 Thread Benjamin Peterson

On Wed, Jun 25, 2014, at 23:38, Ned Deily wrote:
> In article <[email protected]>, Jesus Cea  wrote:
> 
> > On 25/06/14 20:35, Ned Deily wrote:
> > > The 3.3 branch is open only to security fixes. Please don't backport 
> > > other patches to there.
> > > 
> > > https://docs.python.org/devguide/devcycle.html#summary
> > 
> > Ned, I am aware. It is a doc-only fix, like fixing a typo or correcting
> > an incorrect statement. It that is against policy, let me know.
> 
> My understanding is that doc changes are treated the same as any other 
> code changes.  As you noticed, after a release leaves maintenance mode, 
> its documentation is no longer updated on the web site.

To echo Ned, committing a doc change to 3.3 isn't the end of the world.
We just want to make sure energy is focused on the 3 branches we do
fully maintain.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] C version of functools.lru_cache

2014-06-26 Thread Peter Brady

Hello python devs,

I was recently in need of some faster caching and thought this would be a
good opportunity to familiarize myself with the Python/C api so I wrote a C
extension for the lru_cache in functools.  The source is at
https://github.com/pbrady/fastcache.git and I've posted it as a package on
PyPI (fastcache).  There are some simple benchmarks on the github page
showing about 9x speedup.  I would like to submit this for incorporation
into the standard library.  Is there any interest in this? I suspect it
probably requires some changes/cleanup especially since I haven't addressed
thread-safety at all.

Thanks,
Peter.

P.S. This was the motivation for the faster caching
https://github.com/sympy/sympy/pull/7464.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C version of functools.lru_cache

2014-06-26 Thread Benjamin Peterson

You might look at https://bugs.python.org/issue14373

On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote:
> Hello python devs,
> 
> I was recently in need of some faster caching and thought this would be a
> good opportunity to familiarize myself with the Python/C api so I wrote a
> C
> extension for the lru_cache in functools.  The source is at
> https://github.com/pbrady/fastcache.git and I've posted it as a package
> on
> PyPI (fastcache).  There are some simple benchmarks on the github page
> showing about 9x speedup.  I would like to submit this for incorporation
> into the standard library.  Is there any interest in this? I suspect it
> probably requires some changes/cleanup especially since I haven't
> addressed
> thread-safety at all.
> 
> Thanks,
> Peter.
> 
> P.S. This was the motivation for the faster caching
> https://github.com/sympy/sympy/pull/7464.
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C version of functools.lru_cache

2014-06-26 Thread Peter Brady

Looks like it's already in the works!  Nevermind


On Thu, Jun 26, 2014 at 10:33 AM, Benjamin Peterson 
wrote:

> You might look at https://bugs.python.org/issue14373
>
> On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote:
> > Hello python devs,
> >
> > I was recently in need of some faster caching and thought this would be a
> > good opportunity to familiarize myself with the Python/C api so I wrote a
> > C
> > extension for the lru_cache in functools.  The source is at
> > https://github.com/pbrady/fastcache.git and I've posted it as a package
> > on
> > PyPI (fastcache).  There are some simple benchmarks on the github page
> > showing about 9x speedup.  I would like to submit this for incorporation
> > into the standard library.  Is there any interest in this? I suspect it
> > probably requires some changes/cleanup especially since I haven't
> > addressed
> > thread-safety at all.
> >
> > Thanks,
> > Peter.
> >
> > P.S. This was the motivation for the faster caching
> > https://github.com/sympy/sympy/pull/7464.
> > ___
> > Python-Dev mailing list
> > [email protected]
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Binary CPython distribution for Linux

2014-06-26 Thread Gregory Szorc

I'm an advocate of getting users and projects to move to modern Python
versions. I believe dropping support for end-of-lifed Python versions is
important for the health of the Python community. If you've done any
amount of Python 3 porting work, you know things get much harder the
more 2.x legacy versions you need to support.

I led the successful charge to drop support for Python 2.6 and below
from Firefox's build system. I failed to win the argument that Mercurial
should drop 2.4 and 2.5 [1]. A few years ago, I started a similar
conversation with the LLVM project [2]. I wrote a blog post on the
subject [3] that even got Slashdotted [4] (although I don't think that's
the honor it was a decade ago).

While much of the opposition to dropping Python <2.7 stems from the RHEL
community (they still have 2.4 in extended support and 2.7 wasn't in a
release until a few weeks ago), a common objection from the users is "I
can't install a different Python" or "it's too difficult to install a
different Python." The former is a legit complaint - if you are on
shared hosting and don't have root, as easy as it is to add an alternate
package repository that provides 2.7 (or newer), you don't have the
permissions so you can't do it.

This leaves users with attempting a userland install of Python.
Personally, I think installing Python in userland is relatively simple.
Tools like pyenv make this turnkey. Worst case you fall back to
configure + make. But I'm an experienced developer and have a compiler
toolchain and library dependencies on my machine. What about less
experienced users or people that don't have the necessary build
dependencies? And, even if they do manage to find or build a Python
distribution, we all know that there's enough finicky behavior with
things like site-packages default paths to cause many headaches, even
for experienced Python hackers.

I'd like to propose a solution to this problem: a pre-built distribution
of CPython for Linux available via www.python.org in the list of
downloads for a particular release [5]. This distribution could be
downloaded and unarchived into the user's home directory and users could
start running it immediately by setting an environment variable or two,
creating a symlink, or even running a basic installer script. This would
hopefully remove the hurdles of obtaining a (sane) Python distribution
on Linux. This would allow projects to more easily drop end-of-life
Python versions and would speed adoption of modern Python, including
Python 3 (because porting is much easier if you only have to target 2.7).

I understand there may be technical challenges with doing this for some
distributions and with producing a universal binary distribution. I
would settle for a binary distribution that was targeted towards RHEL
users and variant distros, as that is the user population that I
perceive to be the most conservative and responsible for holding modern
Python adoption back.

[1]
http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902

[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html
[3]
http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/
[4]
http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases

[5] https://www.python.org/download/releases/2.7.7/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-26 Thread Joseph Martinot-Lagarde

Le 26/06/2014 20:34, Gregory Szorc a écrit :

[1]
http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902
[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html
[3]
http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/

[4]
http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases

[5] https://www.python.org/download/releases/2.7.7/

Just today I installed Anaconda
(https://store.continuum.io/cshop/anaconda/) on Linux servers running
CentOS 6.4. It installs in a directory anywhere in the filesystem (no
need to be root), and using it globally is just a matter of prepending a
folder to the PATH and it was done.

Of course Anaconda is oriented towards scientific applications but it is
a proof that a pre-build binary installer works and can be simple to use.

If someone wants to try it without all scientific libraries they provide
Miniconda (http://conda.pydata.org/miniconda.html) which contains only
python and the python package manager conda.

Joseph

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-26 Thread Antonio Cavallo

I have a little pet project for building rpm of python 2.7 (it should be 
trivial to port to 3.x):


https://build.opensuse.org/project/show/home:cavallo71:opt-python-modules

If there's enough interest I can help to integrate with python.org.


>> I understand there may be technical challenges with doing this for some
>> distributions and with producing a universal binary distribution.

Opensuse provides the vm to build binaries for multiple platforms 
already since a very long time.


> Of course Anaconda is oriented towards scientific applications but it is
> a proof that a pre-build binary installer works and can be simple to use.

Rpm are the "blessed" way to instal software on linux: it supports what 
most sysadmin expect (easy to list the installed packages, easy to 
validate if tampering to a package occurred, which file belongs to a 
package? etc..).


Anaconda might appeal some group of user, but for deployment 
company-wide rpm is the best technical solution given its integration in 
linux.



I hope this helps,
Antonio
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-26 Thread Joseph Martinot-Lagarde


Le 26/06/2014 22:00, Antonio Cavallo a écrit :

 > Of course Anaconda is oriented towards scientific applications but it is
 > a proof that a pre-build binary installer works and can be simple to
use.

Rpm are the "blessed" way to instal software on linux: it supports what
most sysadmin expect (easy to list the installed packages, easy to
validate if tampering to a package occurred, which file belongs to a
package? etc..).

Anaconda might appeal some group of user, but for deployment
company-wide rpm is the best technical solution given its integration in
linux.


1. Not all Linux distros use rpm (Debian, Ubuntu, Arch Linux...)
2. rpm need to be root to be installed.

Btw, Anaconda is multiplatform and can be installed on Linux, Windows 
and Mac.


Joseph

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Ben Hoyt

Hi Python dev folks,

I've written a PEP proposing a specific os.scandir() API for a
directory iterator that returns the stat-like info from the OS, the
main advantage of which is to speed up os.walk() and similar
operations between 4-20x, depending on your OS and file system. Full
details, background info, and context links are in the PEP, which
Victor Stinner has uploaded at the following URL, and I've also copied
inline below.

http://legacy.python.org/dev/peps/pep-0471/

Would love feedback on the PEP, but also of course on the proposal itself.

-Ben


PEP: 471
Title: os.scandir() function -- a better and faster directory iterator
Version: $Revision$
Last-Modified: $Date$
Author: Ben Hoyt 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-May-2014
Python-Version: 3.5


Abstract


This PEP proposes including a new directory iteration function,
``os.scandir()``, in the standard library. This new function adds
useful functionality and increases the speed of ``os.walk()`` by 2-10
times (depending on the platform and file system) by significantly
reducing the number of times ``stat()`` needs to be called.


Rationale
=

Python's built-in ``os.walk()`` is significantly slower than it needs
to be, because -- in addition to calling ``os.listdir()`` on each
directory -- it executes the system call ``os.stat()`` or
``GetFileAttributes()`` on each file to determine whether the entry is
a directory or not.

But the underlying system calls -- ``FindFirstFile`` /
``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
already tell you whether the files returned are directories or not, so
no further system calls are needed. In short, you can reduce the
number of system calls from approximately 2N to N, where N is the
total number of files and directories in the tree. (And because
directory trees are usually much wider than they are deep, it's often
much better than this.)

In practice, removing all those extra system calls makes ``os.walk()``
about **8-9 times as fast on Windows**, and about **2-3 times as fast
on Linux and Mac OS X**. So we're not talking about micro-
optimizations. See more `benchmarks`_.

.. _`benchmarks`: https://github.com/benhoyt/scandir#benchmarks

Somewhat relatedly, many people (see Python `Issue 11406`_) are also
keen on a version of ``os.listdir()`` that yields filenames as it
iterates instead of returning them as one big list. This improves
memory efficiency for iterating very large directories.

So as well as providing a ``scandir()`` iterator function for calling
directly, Python's existing ``os.walk()`` function could be sped up a
huge amount.

.. _`Issue 11406`: http://bugs.python.org/issue11406


Implementation
==

The implementation of this proposal was written by Ben Hoyt (initial
version) and Tim Golden (who helped a lot with the C extension
module). It lives on GitHub at `benhoyt/scandir`_.

.. _`benhoyt/scandir`: https://github.com/benhoyt/scandir

Note that this module has been used and tested (see "Use in the wild"
section in this PEP), so it's more than a proof-of-concept. However,
it is marked as beta software and is not extensively battle-tested.
It will need some cleanup and more thorough testing before going into
the standard library, as well as integration into `posixmodule.c`.



Specifics of proposal
=

Specifically, this PEP proposes adding a single function to the ``os``
module in the standard library, ``scandir``, that takes a single,
optional string as its argument::

scandir(path='.') -> generator of DirEntry objects

Like ``listdir``, ``scandir`` calls the operating system's directory
iteration system calls to get the names of the files in the ``path``
directory, but it's different from ``listdir`` in two ways:

* Instead of bare filename strings, it returns lightweight
  ``DirEntry`` objects that hold the filename string and provide
  simple methods that allow access to the stat-like data the operating
  system returned.

* It returns a generator instead of a list, so that ``scandir`` acts
  as a true iterator instead of returning the full list immediately.

``scandir()`` yields a ``DirEntry`` object for each file and directory
in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
pseudo-directories are skipped, and the entries are yielded in
system-dependent order. Each ``DirEntry`` object has the following
attributes and methods:

* ``name``: the entry's filename, relative to ``path`` (corresponds to
  the return values of ``os.listdir``)

* ``is_dir()``: like ``os.path.isdir()``, but requires no system calls
  on most systems (Linux, Windows, OS X)

* ``is_file()``: like ``os.path.isfile()``, but requires no system
  calls on most systems (Linux, Windows, OS X)

* ``is_symlink()``: like ``os.path.islink()``, but requires no system
  calls on most systems (Linux, Windows, OS X)

* ``lstat()``: like ``os.lstat()``, but requires no system calls on
  Windows

The ``DirEntry

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread MRAB


On 2014-06-26 23:59, Ben Hoyt wrote:

Hi Python dev folks,

I've written a PEP proposing a specific os.scandir() API for a
directory iterator that returns the stat-like info from the OS, the
main advantage of which is to speed up os.walk() and similar
operations between 4-20x, depending on your OS and file system. Full
details, background info, and context links are in the PEP, which
Victor Stinner has uploaded at the following URL, and I've also
copied inline below.

http://legacy.python.org/dev/peps/pep-0471/

Would love feedback on the PEP, but also of course on the proposal
itself.


[snip]
Personally, I'd prefer the name 'iterdir' because it emphasises that
it's an iterator.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Tim Delaney

On 27 June 2014 09:28, MRAB  wrote:

> Personally, I'd prefer the name 'iterdir' because it emphasises that
> it's an iterator.


Exactly what I was going to post (with the added note that thee's an
obvious symmetry with listdir).

+1 for iterdir rather than scandir

Other than that:

+1 for adding scandir to the stdlib
-1 for windows_wildcard (it would be an attractive nuisance to write
windows-only code)

Tim Delaney
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Paul Sokolovsky

Hello,

On Thu, 26 Jun 2014 18:59:45 -0400
Ben Hoyt  wrote:

> Hi Python dev folks,
> 
> I've written a PEP proposing a specific os.scandir() API for a
> directory iterator that returns the stat-like info from the OS, the
> main advantage of which is to speed up os.walk() and similar
> operations between 4-20x, depending on your OS and file system. Full
> details, background info, and context links are in the PEP, which
> Victor Stinner has uploaded at the following URL, and I've also copied
> inline below.

I noticed obvious inefficiency of os.walk() implemented in terms of
os.listdir() when I worked on "os" module for MicroPython. I essentially
did what your PEP suggests - introduced internal generator function
(ilistdir_ex() in
https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85
), in terms of which both os.listdir() and os.walk() are implemented.

With my MicroPython hat on, os.scandir() would make things only worse.
With current interface, one can either have inefficient implementation
(like CPython chose) or efficient implementation (like MicroPython
chose) - all transparently. os.scandir() supposedly opens up efficient
implementation for everyone, but at the price of bloating API and
introducing heavy-weight objects to wrap info. PEP calls it
"lightweight DirEntry objects", but that cannot be true, because all
Python objects are heavy-weight, especially those which have methods.

It would be better if os.scandir() was specified to return a struct
(named tuple) compatible with return value of os.stat() (with only
fields relevant to underlying readdir()-like system call). The grounds
for that are obvious: it's already existing data interface in module
"os", which is also based on open standard for operating systems -
POSIX, so if one is to expect something about file attributes, it's
what one can reasonably base expectations on.

But reusing os.stat struct is glaringly not what's proposed. And
it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(),
but requires no system calls on Windows". Nice, but OS "FooBar" can do
much more than Windows - it has a system call to send a file by email,
right when scanning a directory containing it. So, why not to have
DirEntry.send_by_email(recipient) method? I hear the answer - it's
because CPython strives to support Windows well, while doesn't care
about "FooBar" OS.

And then it again leads to the question I posed several times - where's
line between "CPython" and "Python"? Is it grounded for CPython to add
(or remove) to Python stdlib something which is useful for its users,
but useless or complicating for other Python implementations?
Especially taking into account that there's "win32api" module allowing
Windows users to use all wonders of its API? Especially that os.stat
struct is itself pretty extensible
(https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix
systems (such as FreeBSD), the following attributes may be
available ...", "On Mac OS systems...", - so extra fields can be added
for Windows just the same, if really needed).

> 
> http://legacy.python.org/dev/peps/pep-0471/
> 
> Would love feedback on the PEP, but also of course on the proposal
> itself.
> 
> -Ben
> 

[]

-- 
Best regards,
 Paul  mailto:[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Ethan Furman


On 06/26/2014 04:36 PM, Tim Delaney wrote:

On 27 June 2014 09:28, MRAB wrote:


Personally, I'd prefer the name 'iterdir' because it emphasises that
it's an iterator.


Exactly what I was going to post (with the added note that thee's an obvious 
symmetry with listdir).

+1 for iterdir rather than scandir

Other than that:

+1 for adding [it] to the stdlib


+1 for all of above

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Benjamin Peterson

On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote:
> 
> With my MicroPython hat on, os.scandir() would make things only worse.
> With current interface, one can either have inefficient implementation
> (like CPython chose) or efficient implementation (like MicroPython
> chose) - all transparently. os.scandir() supposedly opens up efficient
> implementation for everyone, but at the price of bloating API and
> introducing heavy-weight objects to wrap info. PEP calls it
> "lightweight DirEntry objects", but that cannot be true, because all
> Python objects are heavy-weight, especially those which have methods.

Why do you think methods make an object more heavyweight? namedtuples
have methods.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Paul Sokolovsky

Hello,

On Thu, 26 Jun 2014 17:35:21 -0700
Benjamin Peterson  wrote:

> On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote:
> > 
> > With my MicroPython hat on, os.scandir() would make things only
> > worse. With current interface, one can either have inefficient
> > implementation (like CPython chose) or efficient implementation
> > (like MicroPython chose) - all transparently. os.scandir()
> > supposedly opens up efficient implementation for everyone, but at
> > the price of bloating API and introducing heavy-weight objects to
> > wrap info. PEP calls it "lightweight DirEntry objects", but that
> > cannot be true, because all Python objects are heavy-weight,
> > especially those which have methods.
> 
> Why do you think methods make an object more heavyweight? 

Because you need to call them. And if the only thing they do is return
object field, call overhead is rather noticeable.

> namedtuples have methods.

Yes, unfortunately. But fortunately, named tuple is a subclass of
tuple, so user caring for efficiency can just use numeric indexing
which existed for os.stat values all the time, blissfully ignoring
cruft which have been accumulating there since 1.5 times.


-- 
Best regards,
 Paul  mailto:[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Ryan

+1 for scandir.
-1 for iterdir(scandir sounds fancier).
- for windows_wildcard.

Tim Delaney  wrote:
>On 27 June 2014 09:28, MRAB  wrote:
>
>> Personally, I'd prefer the name 'iterdir' because it emphasises that
>> it's an iterator.
>
>
>Exactly what I was going to post (with the added note that thee's an
>obvious symmetry with listdir).
>
>+1 for iterdir rather than scandir
>
>Other than that:
>
>+1 for adding scandir to the stdlib
>-1 for windows_wildcard (it would be an attractive nuisance to write
>windows-only code)
>
>Tim Delaney
>
>
>
>
>___
>Python-Dev mailing list
>[email protected]
>https://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe:
>https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Ben Hoyt

I don't mind iterdir() and would take it :-), but I'll just say why I
chose the name scandir() -- though it wasn't my suggestion originally:

iterdir() sounds like just an iterator version of listdir(), kinda
like keys() and iterkeys() in Python 2. Whereas in actual fact the
return values are quite different (DirEntry objects vs strings), and
so the name change reflects that difference a little.

I'm also -1 on windows_wildcard. I think it's asking for trouble, and
wouldn't gain much on Windows in most cases anyway.

-Ben

On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman  wrote:
> On 06/26/2014 04:36 PM, Tim Delaney wrote:
>>
>> On 27 June 2014 09:28, MRAB wrote:
>>>
>>>
>>> Personally, I'd prefer the name 'iterdir' because it emphasises that
>>> it's an iterator.
>>
>>
>> Exactly what I was going to post (with the added note that thee's an
>> obvious symmetry with listdir).
>>
>> +1 for iterdir rather than scandir
>>
>> Other than that:
>>
>> +1 for adding [it] to the stdlib
>
>
> +1 for all of above
>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread MRAB


On 2014-06-27 02:37, Ben Hoyt wrote:

I don't mind iterdir() and would take it :-), but I'll just say why I
chose the name scandir() -- though it wasn't my suggestion originally:

iterdir() sounds like just an iterator version of listdir(), kinda
like keys() and iterkeys() in Python 2. Whereas in actual fact the
return values are quite different (DirEntry objects vs strings), and
so the name change reflects that difference a little.


[snip]

The re module has 'findall', which returns a list of strings, and
'finditer', which returns an iterator that yields match objects, so
there's a precedent. :-)

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Ben Hoyt

> os.listdir() when I worked on "os" module for MicroPython. I essentially
> did what your PEP suggests - introduced internal generator function
> (ilistdir_ex() in
> https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85
> ), in terms of which both os.listdir() and os.walk() are implemented.

Nice (though I see the implementation is very *nix specific).

> With my MicroPython hat on, os.scandir() would make things only worse.
> With current interface, one can either have inefficient implementation
> (like CPython chose) or efficient implementation (like MicroPython
> chose) - all transparently. os.scandir() supposedly opens up efficient
> implementation for everyone, but at the price of bloating API and
> introducing heavy-weight objects to wrap info. PEP calls it
> "lightweight DirEntry objects", but that cannot be true, because all
> Python objects are heavy-weight, especially those which have methods.

It's a fair point that os.walk() can be implemented efficiently
without adding a new function and API. However, often you'll want more
info, like the file size, which scandir() can give you via
DirEntry.lstat(), which is free on Windows. So opening up this
efficient API is beneficial.

In CPython, I think the DirEntry objects are as lightweight as
stat_result objects.

I'm an embedded developer by background, so I know the constraints
here, but I really don't think Python's development should be tailored
to fit MicroPython. If os.scandir() is not very efficient on
MicroPython, so be it -- 99% of all desktop/server users will gain
from it.

> It would be better if os.scandir() was specified to return a struct
> (named tuple) compatible with return value of os.stat() (with only
> fields relevant to underlying readdir()-like system call). The grounds
> for that are obvious: it's already existing data interface in module
> "os", which is also based on open standard for operating systems -
> POSIX, so if one is to expect something about file attributes, it's
> what one can reasonably base expectations on.

Yes, we considered this early on (see the python-ideas and python-dev
threads referenced in the PEP), but decided it wasn't a great API to
overload stat_result further, and have most of the attributes None or
not present on Linux.

> Especially that os.stat struct is itself pretty extensible
> (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix
> systems (such as FreeBSD), the following attributes may be
> available ...", "On Mac OS systems...", - so extra fields can be added
> for Windows just the same, if really needed).

Yes. Incidentally, I just submitted an (accepted) patch for Python 3.5
that adds the full Win32 file attribute data to stat_result objects on
Windows (see https://docs.python.org/3.5/whatsnew/3.5.html#os).

However, for scandir() to be useful, you also need the name. My
original version of this directory iterator returned two-tuples of
(name, stat_result). But most people didn't like the API, and I don't
really either. You could overload stat_result with a .name attribute
in this case, but it still isn't a nice API to have most of the
attributes None, and then you have to test for that, etc.

So basically we tweaked the API to do what was best, and ended up with
it returning DirEntry objects with is_file() and similar methods.

Hope that helps give a bit more context. If you haven't read the
relevant python-ideas and python-dev threads, those are interesting
too.

-Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Gregory P. Smith

+1 on getting this in for 3.5.

If the only objection people are having is the stupid paint color of the
name I don't care what it's called!  scandir matches the libc API of the
same name.  iterdir also makes sense to anyone reading it.  Whoever checks
this in can pick one and be done with it.  We have other Python APIs with
iter in the name and tend not to be trying to mirror C so much these days
so the iterdir folks do have a valid point.

I'm not a huge fan of the DirEntry object and the method calls on it
instead of simply yielding tuples of (filename,
partially_filled_in_stat_result) but I don't *really* care which is used as
they both work fine and it is trivial to wrap with another generator
expression to turn it into exactly what you want anyways.

Python not having the ability to operate on large directories means Python
simply cannot be used for common system maintenance tasks.  Python being
slow to walk a file system due to unnecessary stat calls (often each an
entire io op. requiring a disk seek!) due to the existing information that
it throws away not being used via listdir is similarly a problem. This
addresses both.

IMNSHO, it is a single function, it belongs in the os module right next to
listdir.

-gps

On Thu, Jun 26, 2014 at 6:37 PM, Ben Hoyt  wrote:

> I don't mind iterdir() and would take it :-), but I'll just say why I
> chose the name scandir() -- though it wasn't my suggestion originally:
>
> iterdir() sounds like just an iterator version of listdir(), kinda
> like keys() and iterkeys() in Python 2. Whereas in actual fact the
> return values are quite different (DirEntry objects vs strings), and
> so the name change reflects that difference a little.
>
> I'm also -1 on windows_wildcard. I think it's asking for trouble, and
> wouldn't gain much on Windows in most cases anyway.
>
> -Ben
>
> On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman  wrote:
> > On 06/26/2014 04:36 PM, Tim Delaney wrote:
> >>
> >> On 27 June 2014 09:28, MRAB wrote:
> >>>
> >>>
> >>> Personally, I'd prefer the name 'iterdir' because it emphasises that
> >>> it's an iterator.
> >>
> >>
> >> Exactly what I was going to post (with the added note that thee's an
> >> obvious symmetry with listdir).
> >>
> >> +1 for iterdir rather than scandir
> >>
> >> Other than that:
> >>
> >> +1 for adding [it] to the stdlib
> >
> >
> > +1 for all of above
> >
> > --
> > ~Ethan~
> >
> > ___
> > Python-Dev mailing list
> > [email protected]
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Steven D'Aprano

On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote:

> With my MicroPython hat on, os.scandir() would make things only worse.
> With current interface, one can either have inefficient implementation
> (like CPython chose) or efficient implementation (like MicroPython
> chose) - all transparently. os.scandir() supposedly opens up efficient
> implementation for everyone, but at the price of bloating API and
> introducing heavy-weight objects to wrap info. 

os.scandir is not part of the Python API, it is not a built-in function. 
It is part of the CPython standard library. That means (in my opinion) 
that there is an expectation that other Pythons should provide it, but 
not an absolute requirement. Especially for the os module, which by 
definition is platform-specific. In my opinion that means you have four 
options:

1. provide os.scandir, with exactly the same semantics as on CPython;

2. provide os.scandir, but change its semantics to be more lightweight 
   (e.g. return an ordinary tuple, as you already suggest);

3. don't provide os.scandir at all; or

4. do something different depending on whether the platform is Linux
   or an embedded system.

I would consider any of those acceptable for a library feature, but not 
for a language feature.

[...]
> But reusing os.stat struct is glaringly not what's proposed. And
> it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(),
> but requires no system calls on Windows". Nice, but OS "FooBar" can do
> much more than Windows - it has a system call to send a file by email,
> right when scanning a directory containing it. So, why not to have
> DirEntry.send_by_email(recipient) method? I hear the answer - it's
> because CPython strives to support Windows well, while doesn't care
> about "FooBar" OS.

Correct. If there is sufficient demand for FooBar, then CPython may 
support it. Until then, FooBarPython can support it, and offer whatever 
platform-specific features are needed within its standard library.

> And then it again leads to the question I posed several times - where's
> line between "CPython" and "Python"? Is it grounded for CPython to add
> (or remove) to Python stdlib something which is useful for its users,
> but useless or complicating for other Python implementations?

I think so. And other implementations are free to do the same thing.

Of course there is an expectation that the standard library of most 
implementations will be broadly similar, but not that they will be 
identical.

I am surprised that both Jython and IronPython offer an non-functioning 
dis module: you can import it successfully, but if there's a way to 
actually use it, I haven't found it:

steve@orac:~$ jython
Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19)
[OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis(lambda x: x+1)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/share/jython/Lib/dis.py", line 42, in dis
disassemble(x)
  File "/usr/share/jython/Lib/dis.py", line 64, in disassemble
linestarts = dict(findlinestarts(co))
  File "/usr/share/jython/Lib/dis.py", line 183, in findlinestarts
byte_increments = [ord(c) for c in code.co_lnotab[0::2]]
AttributeError: 'tablecode' object has no attribute 'co_lnotab'

IronPython gives a different exception:

steve@orac:~$ ipy
IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1433
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis(lambda x: x+1)
Traceback (most recent call last):
TypeError: don't know how to disassemble code objects

It's quite annoying, I would have rather that they just removed the 
module altogether. Better still would have been to disassemble code 
objects to whatever byte code the Java and .Net platforms use. But 
there's surely no requirement to disassemble to CPython byte code!

-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Steven D'Aprano

On Thu, Jun 26, 2014 at 09:37:50PM -0400, Ben Hoyt wrote:
> I don't mind iterdir() and would take it :-), but I'll just say why I
> chose the name scandir() -- though it wasn't my suggestion originally:
> 
> iterdir() sounds like just an iterator version of listdir(), kinda
> like keys() and iterkeys() in Python 2. Whereas in actual fact the
> return values are quite different (DirEntry objects vs strings), and
> so the name change reflects that difference a little.

+1 

I think that's a good objective reason to prefer scandir, which suits 
me, because my subjective opinion is that "iterdir" is an inelegant 
and less than attractive name.


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Glenn Linderman


I'm generally +1, with opinions noted below on these two topics.

On 6/26/2014 3:59 PM, Ben Hoyt wrote:

Should there be a way to access the full path?
--

Should ``DirEntry``'s have a way to get the full path without using
``os.path.join(path, entry.name)``? This is a pretty common pattern,
and it may be useful to add pathlib-like ``str(entry)`` functionality.
This functionality has also been requested in `issue 13`_ on GitHub.

.. _`issue 13`:https://github.com/benhoyt/scandir/issues/13


+1


Should it expose Windows wildcard functionality?


Should ``scandir()`` have a way of exposing the wildcard functionality
in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The
scandir module on GitHub exposes this as a ``windows_wildcard``
keyword argument, allowing Windows power users the option to pass a
custom wildcard to ``FindFirstFile``, which may avoid the need to use
``fnmatch`` or similar on the resulting names. It is named the
unwieldly ``windows_wildcard`` to remind you you're writing power-
user, Windows-only code if you use it.

This boils down to whether ``scandir`` should be about exposing all of
the system's directory iteration features, or simply providing a fast,
simple, cross-platform directory iteration API.

This PEP's author votes for not including ``windows_wildcard`` in the
standard library version, because even though it could be useful in
rare cases (say the Windows Dropbox client?), it'd be too easy to use
it just because you're a Windows developer, and create code that is
not cross-platform.


Because another common pattern is to check for name matches pattern, I 
think it would be good to have a feature that provides such. I do that 
in my own private directory listing extensions, and also some command 
lines expose it to the user.  Where exposed to the user, I use -p 
windows-pattern and -P regexp. My implementation converts the 
windows-pattern to a regexp, and then uses common code, but for this 
particular API, because the windows_wildcard can be optimized by the 
window API call used, it would make more sense to pass windows_wildcard 
directly to FindFirst on Windows, but on *nix convert it to a regexp. 
Both Windows and *nix would call re to process pattern matches except 
for the case on Windows of having a Windows pattern passed in. The 
alternate parameter could simply be called wildcard, and would be a 
regexp. If desired, other flavors of wildcard bsd_wildcard? could also 
be implemented, but I'm not sure there are any benefits to them, as 
there are, as far as I am aware, no optimizations for those patterns in 
those systems.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-26 Thread Paul Moore

On 26 June 2014 23:59, Ben Hoyt  wrote:
> Would love feedback on the PEP, but also of course on the proposal itself.

A solid +1 from me.

Some specific points:

- I'm in favour of it being in the os module. It's more discoverable
there, as well as the other reasons mentioned.
- I prefer scandir as the name, for the reason you gave (the output
isn't the same as an iterator version of listdir)
- I'm mildly against windows_wildcard (even though I'm a windows user)
- You mention the caching behaviour of DirEntry objects. The
limitations should be clearly covered in the final docs, as it's the
sort of thing people will get wrong otherwise.

Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

Re: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented

[Python-Dev] C version of functools.lru_cache

Re: [Python-Dev] C version of functools.lru_cache

Re: [Python-Dev] C version of functools.lru_cache

[Python-Dev] Binary CPython distribution for Linux

Re: [Python-Dev] Binary CPython distribution for Linux

Re: [Python-Dev] Binary CPython distribution for Linux

Re: [Python-Dev] Binary CPython distribution for Linux

[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

27 matches

Site Navigation

Mail list logo

Footer information