> There's a big difference between "not enough memory" and "directory
> consumes lots of memory".  My company has some directories with several
> hundred thousand entries, so using an iterator would be appreciated
> (although by the time we upgrade to Python 3.x, we probably will have
> fixed that architecture).
> 
> But even then, we're talking tens of megabytes at worst, so it's not a
> killer -- just painful.

But what kind of operation do you want to perform on that directory?

I would expect that usually, you either

a) refer to a single file, which you are either going to create, or
   want to process. In that case, you know the name in advance, so
   you open/stat/mkdir/unlink/rmdir the file, without caring how
   many files exist in the directory,
or

b) need to process all files, to count/sum/backup/remove them;
   in this case, you will need the entire list in the process,
   and reading them one-by-one is likely going to slow down
   the entire operation, instead of speeding it up.

So in no case, you actually need to read the entries incrementally.

That the C APIs provide chunk-wise processing is just because
dynamic memory management is so painful to write in C that the
caller is just asked to pass a limited-size output buffer, which then
gets refilled in subsequent read calls. Originally, the APIs would
return a single entry at a time from the file system, which was
super-slow. Today, SysV all-singing all-dancing getdents provides
multiple entries at a time, for performance reasons.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to