> There's a big difference between "not enough memory" and "directory > consumes lots of memory". My company has some directories with several > hundred thousand entries, so using an iterator would be appreciated > (although by the time we upgrade to Python 3.x, we probably will have > fixed that architecture). > > But even then, we're talking tens of megabytes at worst, so it's not a > killer -- just painful.
But what kind of operation do you want to perform on that directory? I would expect that usually, you either a) refer to a single file, which you are either going to create, or want to process. In that case, you know the name in advance, so you open/stat/mkdir/unlink/rmdir the file, without caring how many files exist in the directory, or b) need to process all files, to count/sum/backup/remove them; in this case, you will need the entire list in the process, and reading them one-by-one is likely going to slow down the entire operation, instead of speeding it up. So in no case, you actually need to read the entries incrementally. That the C APIs provide chunk-wise processing is just because dynamic memory management is so painful to write in C that the caller is just asked to pass a limited-size output buffer, which then gets refilled in subsequent read calls. Originally, the APIs would return a single entry at a time from the file system, which was super-slow. Today, SysV all-singing all-dancing getdents provides multiple entries at a time, for performance reasons. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com