[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I haven't really followed, but now that the PEP is accepted, what is the 
progress on this one?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Yes, PEP 471 has been accepted, and I've got a mostly-finished C implementation 
of os.scandir() for CPython 3.5, as well as tests and docs. If you want a sneak 
preview, see posixmodule_scandir*.c, test/test_scandir.py, and os.rst here: 
https://github.com/benhoyt/scandir

It's working well on Windows, but the Linux version has a couple of tiny issues 
yet (core dumps ;-).

Given that os.scandir() will solve this issue (as well as the bigger 
performance problem due to listdir throwing away file type info), can we close 
this issue and open another one to track the implementation of os.scandir() / 
PEP 471?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Given that os.scandir() will solve this issue (as well as the bigger 
 performance problem due to listdir throwing away file type info), can we 
 close this issue and open another one to track the implementation of 
 os.scandir() / PEP 471?

This makes sense. Can you do it?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Okay, I've opened http://bugs.python.org/issue22524, but I don't have the 
permissions to close this one, so could someone with bugs.python.org 
superpowers please do that?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Nick Coghlan

Nick Coghlan added the comment:

This approach has been rejected in favour of the accepted PEP 471 proposal to 
add os.scandir() (issue #22524)

--
resolution:  - rejected
stage: needs patch - resolved
status: open - closed
superseder:  - PEP 471 implementation: os.scandir() directory scanning function

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Jyrki Pulliainen

Changes by Jyrki Pulliainen jy...@dywypi.org:


--
nosy: +nailor

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I'm with Martin and the other respondents who think this shouldn't be done.

Without compelling timings, the smacks of feature creep.  The platform specific 
issues may create an on-going maintenance problem.  The feature itself is prone 
to misuse, leaving hard-to-find race condition bugs in its wake.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread STINNER Victor

STINNER Victor added the comment:

Maybethe development should start outside Python stdlib, on a project on
PyPI.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Raymond, there are very compelling timings/benchmarks for this -- not so much 
the original issue here (generator vs list, that's not really an issue) but 
having a scandir() function that returns the stat-like info from the OS so you 
don't need extra stat calls. This speeds up os.walk() by 7-20 times on Windows 
and 4-5 times on Linux. See more at: 
https://github.com/benhoyt/scandir#benchmarks

I've written a draft PEP that I've sent to the PEP editors (if you're 
interested, it's at https://github.com/benhoyt/scandir/blob/master/PEP.txt). If 
any of the PEP editors are listening here ... would love some feedback on that 
at some stage. :-)

Victor -- development has started outside the stdlib here: 
https://github.com/benhoyt/scandir and PyPI module here: 
https://pypi.python.org/pypi/scandir Both are being used by various people.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks! Will post the PEP to python-dev in the next day or two.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Nick Coghlan

Nick Coghlan added the comment:

I suggest a pass through python-ideas first. python-ideas feedback tends to
be more oriented towards is this proposal as persuasive as it could be?,
while python-dev is more aimed at the is this a good idea or not? yes/no
question. (python-ideas feedback naturally includes some of the latter as
well, but there's a lot more I'm not sure I agree with the idea itself,
but I agree it's worth discussing further feedback than is common on
python-dev)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Nick -- sorry, already posted to python-dev before seeing your latest. However, 
I think it's the right place, as there's already been a fair bit of hashing 
this idea and API out on python-ideas first and then also python-dev. See links 
in the PEP here: http://legacy.python.org/dev/peps/pep-0471/#previous-discussion

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-03-06 Thread Josh Rosenberg

Changes by Josh Rosenberg shadowran...@gmail.com:


--
nosy: +ShadowRanger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-02-13 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy: +ethan.furman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-11-23 Thread Gregory P. Smith

Gregory P. Smith added the comment:

For reference the current state of things for this is the proposal in:
 https://mail.python.org/pipermail/python-dev/2013-May/126196.html

With a prototype using a ctypes based implementation as proof of concept in 
https://github.com/benhoyt/scandir.

A combination of that interface plus my existing scandir patch (-gps02) could 
be created for the final implementation.

As 3.4beta1 happens tonight, this isn't going to make 3.4 so i'm bumping this 
to 3.5.  I really like the proposed design outlined above.

--
stage: patch review - needs patch
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-10-21 Thread Charles-François Natali

Charles-François Natali added the comment:

Gregory, did you make any progress on this?
I think it would be a nice addition.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-10-21 Thread Christian Heimes

Christian Heimes added the comment:

Indeed! I'd like to see the feature in 3.4 so I can remove my own hack from our 
code base.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-10-21 Thread Gregory P. Smith

Gregory P. Smith added the comment:

I haven't had a chance to look at this since May. It'd still be a great 
addition.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy:  -serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-11 Thread Brian Curtin

Changes by Brian Curtin br...@python.org:


--
nosy:  -brian.curtin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-10 Thread Ben Hoyt

Ben Hoyt added the comment:

 Please bring this up on python-dev.

Good idea. Thread started: 
http://mail.python.org/pipermail/python-dev/2013-May/126119.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Charles-François Natali

Charles-François Natali added the comment:

 Charles gave this example of code that would fall over:

 size = 0
 for name, st in scandir(path):
 if stat.S_ISREG(st.st_mode):
 size += st.st_size

 I don't see it, though. In this case you need both .st_mode and .st_size, so 
 a caller would check that those are not None, like so:

Well, that's precisely the point.
A normal caller would never expect a stat object to be partially
populated: if a function has a prototype returning a stat object, then
I definitely expect it to be a regular stat object, with all the
fields guaranteed by POSIX set (st_size, st_ino, st_dev...). By
returning a dummy stat object, you break the stat interface, and I'm
positive this *will* puzzle users and introduce errors.

Now, if I'm the only one who finds this trick dangerous and ugly, you
can go ahead, but I stand by my claim that it's definitely a bad idea
(between this and the explicit Enum value assignent, I feel somewhat
lost lately :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 (between this and the explicit Enum value assignent, I feel somewhat
 lost lately :-)

Don't worry, it sometimes happens :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Ben Hoyt

Ben Hoyt added the comment:

 A normal caller would never expect a stat object to be partially populated: 
 if a function has a prototype returning a stat object, then I definitely 
 expect it to be a regular stat object, with all the fields guaranteed by 
 POSIX set (st_size, st_ino, st_dev...).

I don't think that's true in general, or true of how other Python APIs work. 
For instance, many APIs return a file-like object, and you can only do 
certain things on that object, depending on what the documentation says, or 
what EAFP gets you. Some file-like object don't support seek/tell, some don't 
support close, etc. I've seen plenty of walk-like-a-duck checks like this:

if hasattr(f, 'close'):
f.close()

Anyway, my point boils down to:

* scandir() is a new function, so there aren't old trends or things that will 
break
* we clearly document it as returning a tuple of (name, st), where st is a 
stat-like object whose invididual fields are None if they couldn't be 
determined for free with the directory scanning
* in fact, that's kind of the point of the st object in this function, so the 
example could be the one I gave above where you call os.stat() if either of the 
fields you want is None
* if that's clear in the documentation (of this new function) and the first 
example shows you exactly how it's meant to be used, I think that's pretty sane 
and sensible...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Charles-François Natali

Charles-François Natali added the comment:

 I don't think that's true in general, or true of how other Python APIs work. 
 For instance, many APIs return a file-like object, and you can only do 
 certain things on that object, depending on what the documentation says, or 
 what EAFP gets you. Some file-like object don't support seek/tell, some don't 
 support close, etc. I've seen plenty of walk-like-a-duck checks like this:

Yes, I'm fully aware duck-typing ;-)
But here, you're saying that a duck has a beak, but it *may* have
legs, a tail, etc.
It's just looks wrong to me on so many levels.

Please bring this up on python-dev.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Actually I'm thinking this duck may only have a beak. Instead of a bunch of
fields of None I'd prefer just not having that attribute defined on the
object. I consider the os specific stat-like info from reading a
directory to be so os specific that i'd rather not let someone be confused
by it if it were to be returned up to a higher level caller. It's not a
stat.
On May 6, 2013 2:36 AM, Charles-François Natali rep...@bugs.python.org
wrote:


 Charles-François Natali added the comment:

  I don't think that's true in general, or true of how other Python APIs
 work. For instance, many APIs return a file-like object, and you can only
 do certain things on that object, depending on what the documentation says,
 or what EAFP gets you. Some file-like object don't support seek/tell, some
 don't support close, etc. I've seen plenty of walk-like-a-duck checks like
 this:

 Yes, I'm fully aware duck-typing ;-)
 But here, you're saying that a duck has a beak, but it *may* have
 legs, a tail, etc.
 It's just looks wrong to me on so many levels.

 Please bring this up on python-dev.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue11406
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Charles-François Natali

Charles-François Natali added the comment:

 I'd slightly prefer the name iterdir_stat(), as that almost makes the (name, 
 stat) return values explicit in the name. But that's kind of bikeshedding -- 
 scandir() works too.

I find iterdir_stat() ugly :-)
I like the scandir name, which has some precedent with POSIX.

 That's right: if we have a separate scandir() that returns (name, stat) 
 tuples, then a plain iterdir() is pretty much unnecessary -- callers just 
 ignore the second stat value if they don't care about it.

Hum, wait.
scandir() cannot return (name, stat), because on POSIX, readdir() only
returns d_name and d_type (the type of the entry): to return a stat,
we would have to call stat() on each entry, which would defeat the
performance gain.
And that's the problem with scandir: it's not portable. Depending on
the OS/file system, you could very well get DT_UNKNOWN (and on Linux,
since it uses an adaptive heuristic for NFS filesystem, you could have
some entries with a resolved d_type and some others with DT_UNKNOWN,
on the same directory stream).

That's why scandir would be a rather low-level call, whose main user
would be walkdir, which only needs to know the entry time and not the
whole stat result.

Also, I don't know which information is returned by the readdir
equivalent on Windows, but if we want a consistent API, we have to
somehow map d_type and Windows's returned type to a common type, like
DT_FILE, DT_DIRECTORY, etc (which could be an enum).

The other approach would be to return a dummy stat object with only
st_mode set, but that would be kind of a hack to return a dummy stat
result with only part of the attributes set (some people will get
bitten by this).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Ben Hoyt

Ben Hoyt added the comment:

 I find iterdir_stat() ugly :-) I like the scandir name, which has some 
 precedent with POSIX.

Fair enough. I'm cool with scandir().

 scandir() cannot return (name, stat), because on POSIX, readdir() only 
 returns d_name and d_type (the type of the entry): to return a stat, we would 
 have to call stat() on each entry, which would defeat the performance gain.

Yes, you're right. I solved this in BetterWalk with the solution you propose 
of returning a stat_result object with the fields it could get for free set, 
and the others set to None.

So on Linux, you'd get a stat_result with only st_mode set (or None for 
DT_UNKNOWN), and all the other fields None. However -- st_mode is the one 
you're most likely to use, usually looking just for whether it's a file or 
directory. So calling code would look something like this:

files = []
dirs = []
for name, st in scandir(path):
if st.st_mode is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISDIR(st.st_mode):
dirs.append(name)
else:
files.append(name)

Meaning you'd get the speed improvements 99% of the time (when st_mode) was 
set, but if st_mode is None, you can call stat and handle errors and whatnot 
yourself.

 That's why scandir would be a rather low-level call, whose main user would be 
 walkdir, which only needs to know the entry time and not the whole stat 
 result.

Agreed. This is in the OS module after all, and there's tons of stuff that's 
OS-dependent in there. However, I think that doing something like the above, we 
can make it usable and performant on both Linux and Windows for use cases like 
walking directory trees.

 Also, I don't know which information is returned by the readdir equivalent on 
 Windows, but if we want a consistent API, we have to somehow map d_type and 
 Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc 
 (which could be an enum).

The Windows scan directory functions (FindFirstFile/FindNextFile) return a 
*full* stat (or at least, as much info as you get from a stat in Windows). We 
*could* map them to a common type -- but I'm suggesting that common type might 
as well be stat_result with None meaning not present. That way users don't 
have to learn a completely new type.

 The other approach would be to return a dummy stat object with only st_mode 
 set, but that would be kind of a hack to return a dummy stat result with only 
 part of the attributes set (some people will get bitten by this).

We could document any platform-specific stuff, and places you'd users could get 
bitten. But can you give me an example of where the 
stat_result-with-st_mode-or-None approach falls over completely?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Nick Coghlan

Nick Coghlan added the comment:

I think os.scandir is a case where we *want* a low level call that exposes 
everything we can retrieve efficiently about the directory entries given the 
underlying platform - not everything written in Python is written to be 
portable, especially when it comes to scripts rather than applications (e.g. 
given where I work, I write a fair bit of code that is Fedora/RHEL specific, 
and if that code happens to work anywhere else it's just a bonus rather than 
being of any specific value to me).

This may mean that we just return an info object for each item, where the 
available info is explicitly platform specific. Agreed it can be an actual stat 
object, though.

os.walk then become the cross-platform abstraction built on top of the low 
level scandir call (splitting files from directories is probably about all we 
can do consistently cross-platform without per-entry stat calls).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Charles-François Natali

Charles-François Natali added the comment:

 We could document any platform-specific stuff, and places you'd users could 
 get bitten. But can you give me an example of where the 
 stat_result-with-st_mode-or-None approach falls over completely?

Well, that's easy:

size = 0
for name, st in scandir(path):
if stat.S_ISREG(st.st_mode):
size += st.st_size

 Agreed it can be an actual stat object, though.

Well, the nice thing is that we don't have to create yet another info
object, the downside is that it can be tricky, see above.

We can probably use the DTTOIF macro to convert d_type to st_mode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread STINNER Victor

STINNER Victor added the comment:

I really like scandir() - (name: str, stat: stat structure using None for
unknown fields).

I expect that this API to optimize use cases like:

- glob.glob(*.jpg) in a big directory with few JPEG picture
- os.walk(.) in a directory with many files: should reduce the number of
stat() to zero on most platforms

But as usual, a benchmark on a real platform would be more convicing.

Filtering entries in os.listdir() or os.scandir() would be faster (than
filtering their output), but it hard to design an API to filter arbitrary
fields (name, file type, size, ...) especially because the underlying C
functions does not provide required information. A generator is closer to
Python design and more natural.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Yeah, I very much agree with what Nick says -- we really want a way to expose 
what the platform provides. It's  less important (though still the ideal), to 
expose that in a platform-independent way. Today the only way to get access to 
opendir/readdir on Linux and FindFirst/Next on Windows is by using a bunch of 
messy (and slowish) ctypes code. And yes, os.walk() would be the main 
cross-platform abstraction built on top of this.

Charles gave this example of code that would fall over:

size = 0
for name, st in scandir(path):
if stat.S_ISREG(st.st_mode):
size += st.st_size

I don't see it, though. In this case you need both .st_mode and .st_size, so a 
caller would check that those are not None, like so:

size = 0
for name, st in scandir(path):
if st.st_mode is None or st.st_size is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISREG(st.st_mode):
size += st.st_size

One line of extra code for the caller, but a big performance gain in most cases.

Stinner said, But as usual, a benchmark on a real platform would be more 
convicing. Here's a start: https://github.com/benhoyt/betterwalk#benchmarks -- 
but it's not nearly as good as it gets yet, because those figures are still 
using the ctypes version. I've got a C version that's half-finished, and on 
Windows it makes os.walk() literally 10x the speed of the default version. Not 
sure about Linux/opendir/readdir yet, but I intend to do that too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread STINNER Victor

STINNER Victor added the comment:

size = 0
for name, st in scandir(path):
if st.st_mode is None or st.st_size is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISREG(st.st_mode):
size += st.st_size

It would be safer to use dir_fd parameter when supported, but I don't
think that os.scandir() should care of this problem. An higher level
API like pathlib, walkdir  cie which should be able to reuse *at() C
functions using dir_fd parameter.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

+1 for iterdir. However, if we get a separate scandir() returning entries with 
attributes, is iterdir() still useful?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-04 Thread Ben Hoyt

Ben Hoyt added the comment:

That's right: if we have a separate scandir() that returns (name, stat) tuples, 
then a plain iterdir() is pretty much unnecessary -- callers just ignore the 
second stat value if they don't care about it.

I'd slightly prefer the name iterdir_stat(), as that almost makes the (name, 
stat) return values explicit in the name. But that's kind of bikeshedding -- 
scandir() works too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 However, the reason I'm keen on iterdir_stat() is that I'm seeing it speed up 
 os.walk() by a factor of 10 in my recent tests (note that I've made local 
 mods, so these results aren't reproducible for others yet). This is doing a 
 walk on a dir tree with 7800 files and 155 dirs:

 Using fast _betterwalk
 Priming the system's cache...
 Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
 Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
 Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
 os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast

 Sometimes Windows will go into this I'm really caching stat results good 
 mode -- I don't know what heuristic determines this -- and then I'm seeing a 
 40x speed increase. And no, you didn't read that wrong. :-)

I/O benchmarks shouldn't use timeit or repeated calls: after the first
run, most of your data is in cache, so subsequent runs are
meaningless.

I don't know about Windows, but on Linux you should do something like:
# echo 3  /proc/sys/vm/drop_caches

to start out clean.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-03 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks. I thought about that -- but I think I *want* to benchmark it when
they're cached, so that we're comparing apples with apples, cached system
calls with cached systems calls. The benchmark would almost certainly be a
lot better (BetterWalk would be even faster) if I was comparing the
non-cached results. I'll think about it some more though.

Thoughts?

-Ben

On Fri, May 3, 2013 at 7:03 PM, Charles-François Natali 
rep...@bugs.python.org wrote:


 Charles-François Natali added the comment:

  However, the reason I'm keen on iterdir_stat() is that I'm seeing it
 speed up os.walk() by a factor of 10 in my recent tests (note that I've
 made local mods, so these results aren't reproducible for others yet). This
 is doing a walk on a dir tree with 7800 files and 155 dirs:
 
  Using fast _betterwalk
  Priming the system's cache...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
  os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast
 
  Sometimes Windows will go into this I'm really caching stat results
 good mode -- I don't know what heuristic determines this -- and then I'm
 seeing a 40x speed increase. And no, you didn't read that wrong. :-)

 I/O benchmarks shouldn't use timeit or repeated calls: after the first
 run, most of your data is in cache, so subsequent runs are
 meaningless.

 I don't know about Windows, but on Linux you should do something like:
 # echo 3  /proc/sys/vm/drop_caches

 to start out clean.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue11406
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-03 Thread Evgeny Kapun

Changes by Evgeny Kapun abacabadabac...@gmail.com:


--
nosy: +abacabadabacaba

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread Ben Hoyt

Ben Hoyt added the comment:

Ah, this is great. I definitely like the idea of a generator version of 
os.listdir(). And I like the name iterdir() -- it fits with iteritems() etc. 
They've gone in 3.x, of course, but listdir didn't change to an iterator, so...

See also Betterwalk, my work-in-progress with very similar goals: 
https://github.com/benhoyt/betterwalk#readme

It implements iterdir(), as well as iterdir_stat() which yields (name, stat) 
tuples. iterdir_stat() is especially important on Windows, where the directory 
iteration functions (FindFirstFile/FindNextFile) already give you full stat 
information.

The intent of Betterwalk is to use these functions to speed up os.walk() by 2-3 
times (and that's with the ctypes version, so it'll only get better in pure C).

So I'm +1 for adding iterdir(), and I'd love to see iterdir_stat() so users can 
write fast os.walk() type functions without resorting to C.

I'll look over the attached patches at some stage, especially the Windows code.

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread Ben Hoyt

Ben Hoyt added the comment:

Some folks have asked about benchmarks. I don't know about iterdir() vs 
listdir() -- I kind of suspect the speed gains there wouldn't be big. 

However, the reason I'm keen on iterdir_stat() is that I'm seeing it speed up 
os.walk() by a factor of 10 in my recent tests (note that I've made local mods, 
so these results aren't reproducible for others yet). This is doing a walk on a 
dir tree with 7800 files and 155 dirs:

Using fast _betterwalk
Priming the system's cache...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast

Sometimes Windows will go into this I'm really caching stat results good mode 
-- I don't know what heuristic determines this -- and then I'm seeing a 40x 
speed increase. And no, you didn't read that wrong. :-)

Sorry, I'm getting carried away. This bug is really more about iterdir. But 
seeing Martin suggested the stat/d_type info...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread Nick Coghlan

Nick Coghlan added the comment:

I've had the local Red Hat release engineering team express their displeasure 
at having to stat every file in a network mounted directory tree for info that 
is present in the dirent structure, so a definite +1 to os.scandir from me, so 
long as it makes that info available.

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-04-02 Thread Christian Heimes

Changes by Christian Heimes li...@cheimes.de:


--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-04-01 Thread Charles-François Natali

Charles-François Natali added the comment:

 That way the scandir name would be left available for a future version of 
 this that yields namedtuples of directory entry details as Martin wants to 
 see.

Which might very w'ell be Nick's walkdir, see issue #13229.

BTW, I'm strongly +1 on this addition.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-31 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Here's an updated patch that fixes the windows build based on twouters' 
comments. It goes ahead and removes the old C implementation of listdir in 
favor of the trivial Python wrapping of scandir.

--
Added file: http://bugs.python.org/file29638/issue11406-gps02.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-31 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Bikeshedding: I would find iterdir much easier to remember than scandir 
(especially in relationship with listdir).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-31 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I prefer iterdir also.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-31 Thread Gregory P. Smith

Gregory P. Smith added the comment:

While i don't personally like things with iter in the name I won't object.  
That way the scandir name would be left available for a future version of this 
that yields namedtuples of directory entry details as Martin wants to see.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-28 Thread Gregory P. Smith

Changes by Gregory P. Smith g...@krypto.org:


--
nosy: +twouters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-24 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Here's is an os.scandir(path='.') implementation that iterates reading the 
directory on the fly instead of pre-building a list.

os.listdir's implementation should ultimately be replaced by this as:

def listdir(path=None):
if path is None:
return list(os.scandir())
return list(os.scandir(path))

Though I have not yet done that in this patch so that I could compare behavior 
of old vs new.

Why the scandir name?  Read the libc scandir man page.  It fits.

I have tested this on POSIX (Linux).  I don't have any ability to build Windows 
code so I expect that still has bugs and possibly compilation issues.  Please 
leave comments on the 'review' link.

--
keywords: +patch
stage:  - patch review
type: performance - enhancement
Added file: http://bugs.python.org/file29568/issue11406-gps01.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-24 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Since this is going to be a new API, I would like to return the file type per 
directory entry where supported. I suggest to start with the Linux set of file 
types (DT_BLK, ..., DT_UNKNOWN), perhaps under different names, giving 
'unknown' on systems which don't support this.

People traversing a directory tree can then skip the stat call if it's neither 
'directory' nor 'unknown'.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-24 Thread Gregory P. Smith

Gregory P. Smith added the comment:

you'll see my code already has TODOs in there for that.  windows API 
documentation suggests that windows returns even more (stat-like) info when 
traversing a directory.  returning a namedtuple with the relevant info from the 
platform at hand would be good.

I'd prefer to iterate on the code and get this working as is first, then update 
it to support returning the full details, namedtuple or otherwise.  perhaps via 
an os.scandir vs os.scandirfull or something.  I'd like to keep the ability to 
return just the names: no need to spend time building up the tuples that'll be 
discarded by the os.listdir compatibility code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-24 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I don't this would be much of a win and we're better off not adding yet another 
function to the already overloaded os module.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-24 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Your objection is noted but it is wrong.

A Python program today cannot process arbitrarily large directories within a 
fixed amount of ram today due to os.listdir. This makes it unsuitable for file 
system cleanup tasks that we have run into on production servers.  This fixes 
that without requiring an extension module or fragile ctypes code to work 
around the deficiency.

It _would've been nice_ for os.listdir to be updated to be an iterator with 3.0 
but it wasn't... so we're left supporting its legacy interface rather than just 
replacing it with this.

If think this functionality belongs in a module other than os, please suggest 
where.

long term: os.walk and os.fwalk are also unusable on arbitrary filesystems with 
large directories for the same reason because they use os.listdir.  providing a 
non-directory-preloading version of those is outside the scope of this issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Gregory P. Smith

Gregory P. Smith added the comment:

I'd like to take care of this at Python.  At least for posix (someone else can 
deal with the windows side if they want).

I just stumbled upon an extension module at work that someone wrote 
specifically because os.listdir consumed way too much ram by building up a huge 
list on large directories with tons of long filenames that it needed to 
process.  (when your process is in charge of keeping that directory in check... 
the inability to process it once it grows too large simply isn't acceptable)

--
assignee:  - gregory.p.smith
nosy: +gregory.p.smith
versions: +Python 3.4 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Tim Golden

Tim Golden added the comment:

IIRC Nick Coghlan had put a bit of work into this a few months ago as an 
external module with a view to seeing if it got traction before putting 
anything into the stdlib. Might be worth pinging him, or looking to see 
what he'd done. Can't remember the keywords to search for, I'm afraid. 
Something like directory walker

--
nosy: +tim.golden

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Éric Araujo

Éric Araujo added the comment:

Nick’s lib is called walkdir.  See bitbucket, readthedocs, possibly 
hg.python.org.

(FTR Antoine’s OO abstraction is pathlib)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Charles-François Natali

Charles-François Natali added the comment:

 IIRC Nick Coghlan had put a bit of work into this a few months ago as an
 external module with a view to seeing if it got traction before putting
 anything into the stdlib. Might be worth pinging him, or looking to see
 what he'd done. Can't remember the keywords to search for, I'm afraid.
 Something like directory walker

Nick's walkdir is just an improved walk - with a lot of added
functionality, like filtering etc.
But it's still based on os.walk(), which in turn uses listdir(), since
it's currently the only way to list the content of a directory. So in
short, it won't help for this issue.

To limit the memory usage, on would have to have to use an iterator
based on readdir().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Tim Golden

Tim Golden added the comment:

OK, sorry for the noise then; I had the idea that it was doing something 
with iterators/generators.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-03-06 Thread Gregory P. Smith

Gregory P. Smith added the comment:

right he has a separate issue open tracking the walkdir stuff in
issue13229.  I saw it first before finding this issue which is exactly what
I wanted. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2012-03-11 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 On Unix, doing os.stat
 on the directory, then looking on st_nlink, will tell you whether
 the directory is empty (st_nlink is 2 on an empty directory).

Directory with st_nlink==2 can contains any number of non-directory files. And 
one subdirectory if this directory is root.

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-12 Thread Torsten Landschoff

Torsten Landschoff t.landsch...@gmx.net added the comment:

I would regard this as Type: resource usage, instead of performance. Given 
enough RAM, loading the whole directory at once will likely be faster.

The downsides of os.listdir:
a) You can't get a peek at the files so far, it's all or nothing. I only wanted 
to know if a directory is empty and I have to read the whole thing just to 
throw it away (maybe I missed another library function?)

b) Using it in a GUI basically requires you to use threads if you may run into 
a dir with many files. Especially on a slow filesystem (network). Because you 
won't regain control until the whole thing is read.

I would like to have an iterator version as well, but I also dislike another 
function (especially the x prefix). How about adding a keyword argument to 
select iterator behaviour?

--
nosy: +torsten

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-12 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I would like to have an iterator version as well, but I also dislike
 another function (especially the x prefix). How about adding a
 keyword argument to select iterator behaviour?

Changing the return type based on an argument is generally frown upon
so, if anything, I think it would be better to have a separate function
(whatever its name).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-12 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 The downsides of os.listdir: a) You can't get a peek at the files so
 far, it's all or nothing. I only wanted to know if a directory is
 empty and I have to read the whole thing just to throw it away (maybe
 I missed another library function?)

This depends somewhat on the operating system. On Unix, doing os.stat
on the directory, then looking on st_nlink, will tell you whether
the directory is empty (st_nlink is 2 on an empty directory).

 b) Using it in a GUI basically requires you to use threads if you may
 run into a dir with many files. Especially on a slow filesystem
 (network). Because you won't regain control until the whole thing is
 read.

Hmm. In a GUI, you would typically want to sort the file names by
some criterion, which typically requires you to read all files
(even if you then only display some of them).

 I would like to have an iterator version as well, but I also dislike
 another function (especially the x prefix). How about adding a
 keyword argument to select iterator behaviour?

I still would like to see a demonstrable improvement in a real-world
application.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-11 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 http://lwn.net/Articles/216948/
 Why kernel.org is slow

That's all independent of the issue at hand. Whether or not getdents is
slow doesn't really matter here because we need to call getdents,
anyway: it's the only API to get at the directory contents, iterative
or not.

The issue at hand is whether xlistdir actually provides any advantages
to a real application, and that cannot be answered by looking at
benchmarking results that benchmarked the kernel. The *only* way to
determine whether xlistdir will help is to measure it in a real
application.

I stand by my claim that
a) in cases where you use listdir, you typically have to consider
all file names in the directory eventually. The total number
of getdents calls to be made doesn't change when you switch from
listdir to xlistdir (except in non-realistic micro-benchmarks).

The cases that you don't need to look at all file names are
typically dealt with by knowing the file name in advance
(or perhaps a few alternative spellings it may have), and
hence you don't use listdir at all (but stat/open).

b) If there is some real-world processing of the files (e.g.
at least to look at the file modification time), this processing
(and the IO that goes along with it) by far outweigh the cost
of reading the directory. So even if you could speed up listdir
by making it iterative, the overall gain would be very small.

There are also good reasons *not* to add xlistdir, primarily to
avoid user confusion. If xlistdir was added, all peope would run
off and change all applications of listdir because it is faster,
and have then to deal with backwards compatibility, even though
in most applications, a single getdents call will fetch the entire
directory contents just fine (and hence there is *no* change
in xlistdir, except that the list is not created which uses
a few dozen bytes).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-11 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

My benchmarks show that xlistdir() gives the only memory usage advantage on 
large directories. No speed gain so far - maybe my patch is wrong.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-11 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
nosy: +eric.araujo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-10 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

BTW, can you publish your xlistdir implementation somewhere?
http://pastebin.com/Qnni5HBa

Tests show 10 times smaller memory footprint during directory listing - 25Mb 
against 286Mb on directory with 800K entries.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-10 Thread Марк Коренберг

Марк Коренберг socketp...@gmail.com added the comment:

http://lwn.net/Articles/216948/
Why kernel.org is slow

To proove that readdir is bad thing on large number of items in a directory.

Well, EXT4 has fixed some issues 
(http://ext2.sourceforge.net/2005-ols/paper-html/node3.html) But what about 
locking in linux kernel (vfs and ext4) code?

Also, some conservative linuxes still use ext3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-09 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Can't we simply add os.xlistdir() leaving listdir() as is?

We could, but someone must:
1) provide a patch
2) demonstrate a significant improvement in some real-world situation

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-09 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

We could, but someone must:
1) provide a patch
While working on a straightforward patch for linux, I had to make a lot of 
copy-paste job. posixmodule.c is quite a mess already :(
2) demonstrate a significant improvement in some real-world situation
suppose we have a directory with several millions of files and a cron script 
which must process just a bunch of them at a time. There's no need to gather 
them all.
As mmarkk mentioned - readdir already provides generator style access to the 
directory contents, it could be nice to provide such API in Python.

http://pastebin.com/NCGmfF49 - here's a kind of test (cached and uncached)
http://pastebin.com/tTKRTiNc - here's a testcase for batch processing of 
directory contenst (first is xlistdir(), second - listdir()) both uncached.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-09 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 a cron script which must process just a bunch of them at a time.
  There's no need to gather them all.

Can you please be more explicit? What's the application in which you
have several millions of files in a directory? What's the task that
the processing script needs to perform?

 http://pastebin.com/NCGmfF49 - here's a kind of test (cached and uncached)

This isn't really convincing - the test looks at all files, so it isn't 
clear why xlistdir should do any better than listdir. And indeed, with
a cold cache, xlistdir is slower (IIUC).

 http://pastebin.com/tTKRTiNc - here's a testcase for batch processing of 
 directory contenst (first is xlistdir(), second - listdir()) both uncached.

This is not a real-world application - there is no actual processing done.

BTW, can you publish your xlistdir implementation somewhere?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

There has been discussion of this before, but it must have been on one of the 
lists, (possibly py3k list) as searching tracker for 'listdir generator' only 
returns this.

I believe I pointed out then that Miscrosoft C (also) has (did once) a 
'nextdir' function. It's been so long that I forget details.

I thought then and still do that listdir should (have) change (d) like range, 
map, and filter did, for same reasons.

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 I thought then and still do that listdir should (have) change (d) like range, 
 map, and filter did, for same reasons.

The reasons that applied to map and range don't apply to listdir(). The 
cost of materializing the list in the former cases may be significant, 
but will be marginal in the latter case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

Can't we simply add os.xlistdir() leaving listdir() as is?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Brian Curtin

Brian Curtin cur...@acm.org added the comment:

-1 on going back through blah/xblah all over again.

--
nosy: +brian.curtin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Can't we simply add os.xlistdir() leaving listdir() as is?

Only if an advantage can be demonstrated. in a realistic application.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-08 Thread Марк Коренберг

Марк Коренберг socketp...@gmail.com added the comment:

 -1 on going back through blah/xblah all over again.

Originally, I want return value of listdir to be changed from list to 
generator. But next, I thought about compatibility. It will break some code. 
For example, SimpleHTTPServer:

list = os.listdir(path)
list.sort(key=lambda a: a.lower())

will not work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-07 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

Glibc's readdir() and readdir_r() already do caching, so getdents() syscall is 
called only once on my '/etc' directory. Should we include another caching 
level in xlistdir() function?
On the other hand, we don't know anything about caches at glibc's level, i.e. 
we can't tell if our next call to readdir() will result in syscall or even I/O 
(we could possible release GIL for that).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-07 Thread Марк Коренберг

Марк Коренберг socketp...@gmail.com added the comment:

 Glibc's readdir() and readdir_r() already do caching
Yes, but glibc's readdir is the C analogue of python's generator. We do not 
need to create cache for cached values.
I think it's OK to make python's generator on top of readdir (instead of 
getdents).

Why not to create generator like this?
(pseudocode)
--
DIR *d;
struct dirent* entry, *e;
entry = malloc(offsetof(struct dirent, d_name) + pathconf(dirpath, 
_PC_NAME_MAX) + 1);
if (!e)
raise Exception();
if (!(d= opendir(dirname)))
{
free(e)
raise IOException();
}

for (;;)
{
if (readdir_r(d, entry, e))
{
closedir(d);
free(entry);
raise IOException();
}
if (!e)
break;
yield e;
}
closedir(d);
free(entry);

--

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-06 Thread Vetoshkin Nikita

Vetoshkin Nikita nikita.vetosh...@gmail.com added the comment:

Generator listdir() could be useful if I have a directory with several millions 
of files and I what to process just a hundred.

--
nosy: +nvetoshkin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Марк Коренберг

New submission from Марк Коренберг socketp...@gmail.com:

Big dirs are really slow to read at once. If user wants to read items one by 
one like here:
--
for i in os.listdir()
use(i)
--
having generator will gain performance, as big directories often very 
fragmented on disk. Also, dir_cache in kernel used more effectively.

--
components: Library (Lib)
messages: 130111
nosy: mmarkk
priority: normal
severity: normal
status: open
title: There is no os.listdir() equivalent returning generator instead of list
type: performance
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Big dirs are really slow to read at once. 

Do you a proof for that claim? How big, and how really slow?

 for i in os.listdir()
 use(i)

Also, how long does use(i) take, and what reduction (in percent)
can you gain from listdir iterating?

In short, I'm skeptical that there is an actual problem to be solved here.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Марк Коренберг

Марк Коренберг socketp...@gmail.com added the comment:

also, forgot... memory usage on big directories using list is a pain.

This is the same things as range() and xrange(). Why not to add os.xlistdir() ?

P.S.
Numerical answers will be available later.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Andreas Stührk

Changes by Andreas Stührk andy-pyt...@hammerhartes.de:


--
nosy: +Trundle

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

A generator listdir() geared towards performance should probably be able to 
work in batches, e.g. read 100 entries at once and buffer them in some internal 
storage (that might mean use readdir_r()). Bonus points if it doesn't release 
the GIL around each individual entry, but also batches that.

--
nosy: +neologix, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Giampaolo Rodola'

Changes by Giampaolo Rodola' g.rod...@gmail.com:


--
nosy: +giampaolo.rodola

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2011-03-05 Thread Charles-Francois Natali

Charles-Francois Natali neolo...@free.fr added the comment:

 Big dirs are really slow to read at once. If user wants to read items one by 
 one like here

The problem is that readdir doesn't read a directory entry one at a time.
When you call readdir on an open DIR * for the first time, the libc calls the 
getdents syscall, requesting a whole bunch of dentry at a time (32768 on my 
box).
Then, the subsequent readdir calls are virtually free, and don't involve any 
syscall/IO at all (that is, until you hit the last cached dent, and then 
another getdents is performed until end of directory).

 Also, dir_cache in kernel used more effectively.

You mean the dcache ? Could you elaborate ?

 also, forgot... memory usage on big directories using list is a pain.

This would indeed be a good reason. Do you have numbers ?

 A generator listdir() geared towards performance should probably be able to 
 work in batches, e.g. read 100 entries at once and buffer them in some 
 internal storage (that might mean use readdir_r()).

That's exactly what readdir is doing :-)

 Bonus points if it doesn't release the GIL around each individual entry, but 
 also batches that.

Yes, since only one in 2**15 readdir call actually blocks, that could be a nice 
optimization (I've no idea of the potential gain though).

 Big dirs are really slow to read at once.

Are you using EXT3 ?
There are records of performance issues with getdents on EXT2/3 filesystems, 
see:
http://lwn.net/Articles/216948/
and this nice post by Linus:
https://lkml.org/lkml/2007/1/7/149

Could you provide the output of an strace -ttT python test script  (and 
also the time spent in os.listdir) ?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com