Re: Test for an empty directory that could be very large if it is not empty?
Virgil Stokes wrote: How can I determine if the directory is empty WITHOUT the generation of a list of the file names Which platform? On Windows, I have no idea. On Unix you can't really do this properly without access to opendir() and readdir(), which Python doesn't currently wrap. Will the empty directories be newly created, or could they be ones that *used* to contain 20 files that have since been deleted? If they're new or nearly new, you could probably tell from looking at the size reported by stat() on the directory. The difference between a fresh empty directory and one with 20 files in it should be fairly obvious. A viable strategy might be: If the directory is very large, assume it's not empty. If it's smallish, list its contents to find out for sure. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 07Aug2014 18:14, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Virgil Stokes wrote: How can I determine if the directory is empty WITHOUT the generation of a list of the file names Which platform? On Windows, I have no idea. On Unix you can't really do this properly without access to opendir() and readdir(), which Python doesn't currently wrap. [...] On UNIX (the OP seemed to be using Windows, alas), if you are prepared to be destructive you can just do an rmdir. It will fail if the directory is not empty, and performs ok on the hypothesised remote once-ginormous directory. The commonest reason for wanting to know if a directory is empty that I can imagine is when you want to remove it, and if that applies here it is better to just try to remove it and ask for forgiveness later. Disclaimer: Windows may not offer this handy safety net. Cheers, Cameron Simpson c...@zip.com.au Of course no description of a Ducati engine is complete without mentioning the sound. That deep bass exhaust rumble along with the mechanical music of the desmodromic valves...too bad it isn't available on compact disc. - Sport Rider, evidently before the release of the Ducati Passions CD -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 08/06/2014 03:26 PM, Ben Finney wrote: Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? What is your goal for that? Have you measured the performance difference and decided *based on objective observation* that it's too expensive? Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries in a directory, and thereby to test whether it is empty. That simplicity is very valuable, and you should have a compelling, *measured* reason to do something more complicated. What is it? Plenty of people have measured the slowdown of getting a list of files when the directory has thousands upon thousands. It's why the scandir PEP exists at all, and the slowdown is present even on local file systems. While it may not be objective, walking away to get a cup of your favorite beverage, coming back and seeing the operation is still not done, is certainly sufficient to realize that the simple, easy way is not going to be sufficient. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 2014-08-07 11:27, Ben Finney wrote: The difference in timings when serving a web-request are noticeable (in my use-case, I had to change my algorithm and storage structure to simplify/avoid heavily-populated directories) So, if the requirement is “test whether the directory is empty faster than N microseconds”, that's quite different from “without the generation of a list of the file names”. The former may entail the latter, but that's not to be assumed, and chasing an optimisation prematurely is a common cause of terrible code. I guess my surprise in the 2-3 non-iterator'ization of os.listdir() is that it's very easy to wrap an iterable in list() if you want the whole bunch, but it's much harder to get the performance characteristics of interruptible iteration (e.g. is the directory empty or look at files until you find one matching $CRITERIA). Looking forward to scandir() arriving for just those reasons. -tkc (who sees Ethan Furman's excellent followup-post as I'm about to hit Send) -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
In article c4gjqvf8cm...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Virgil Stokes wrote: How can I determine if the directory is empty WITHOUT the generation of a list of the file names Which platform? On Windows, I have no idea. On Unix you can't really do this properly without access to opendir() and readdir(), which Python doesn't currently wrap. Will the empty directories be newly created, or could they be ones that *used* to contain 20 files that have since been deleted? If they're new or nearly new, you could probably tell from looking at the size reported by stat() on the directory. The difference between a fresh empty directory and one with 20 files in it should be fairly obvious. A viable strategy might be: If the directory is very large, assume it's not empty. If it's smallish, list its contents to find out for sure. I wonder if glob.iglob('*') might help here? -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
Roy Smith wrote: In article c4gjqvf8cm...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Virgil Stokes wrote: How can I determine if the directory is empty WITHOUT the generation of a list of the file names Which platform? On Windows, I have no idea. On Unix you can't really do this properly without access to opendir() and readdir(), which Python doesn't currently wrap. Will the empty directories be newly created, or could they be ones that *used* to contain 20 files that have since been deleted? If they're new or nearly new, you could probably tell from looking at the size reported by stat() on the directory. The difference between a fresh empty directory and one with 20 files in it should be fairly obvious. A viable strategy might be: If the directory is very large, assume it's not empty. If it's smallish, list its contents to find out for sure. I wonder if glob.iglob('*') might help here? No, the glob module uses os.listdir() under the hood. Therefore iglob() is lazy for multiple directories only. -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 2014-08-07 07:54, Roy Smith wrote: I wonder if glob.iglob('*') might help here? My glob.iglob() uses os.listdir() behind the scenes (see glob1() in glob.py) -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? def is_empty_dir(dirpath): return next(scandir(dirpath), None) is None https://github.com/benhoyt/scandir -- Akira -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
In article mailman.12725.1407413212.18130.python-l...@python.org, Tim Chase python.l...@tim.thechases.com wrote: On 2014-08-07 07:54, Roy Smith wrote: I wonder if glob.iglob('*') might help here? My glob.iglob() uses os.listdir() behind the scenes (see glob1() in glob.py) -tkc In which case, the documentation for iglob() is broken. It says: Return an iterator which yields the same values as glob() without actually storing them all simultaneously. If it's calling something which does store them all simultaneously, that's like contracting with somebody to commit a crime, and then trying to claim you're innocent because you didn't commit the crime yourself. -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 2014-08-07 08:19, Roy Smith wrote: My glob.iglob() uses os.listdir() behind the scenes (see glob1() in glob.py) -tkc In which case, the documentation for iglob() is broken. It says: Return an iterator which yields the same values as glob() without actually storing them all simultaneously. I'd tend to agree that iglob() is broken and should use the proposed .scandir() instead for exactly those reasons. Unfortunately, it seems that it might not get back-ported until .scandir() hits. -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
In mailman.12711.1407363468.18130.python-l...@python.org Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? Is it one directory that is sometimes empty and other times teeming with files, or is it a series of directories which are created afresh and then await arrival of the files? If the latter, you could try looking at the size of the directory entry itself. On the system I'm writing from, a freshly-created directory is 4K in size, and will grow in 4K chunks as more and more files are created within the directory. However, the directory entry does not shrink when files are removed. -- John Gordon Imagine what it must be like for a real medical doctor to gor...@panix.comwatch 'House', or a real serial killer to watch 'Dexter'. -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
In article mailman.12729.1407433146.18130.python-l...@python.org, Tim Chase python.l...@tim.thechases.com wrote: On 2014-08-07 08:19, Roy Smith wrote: My glob.iglob() uses os.listdir() behind the scenes (see glob1() in glob.py) -tkc In which case, the documentation for iglob() is broken. It says: Return an iterator which yields the same values as glob() without actually storing them all simultaneously. I'd tend to agree that iglob() is broken and should use the proposed .scandir() instead for exactly those reasons. Unfortunately, it seems that it might not get back-ported until .scandir() hits. -tkc I opened a bug against the 2.7 docs: http://bugs.python.org/issue22167 -- https://mail.python.org/mailman/listinfo/python-list
Test for an empty directory that could be very large if it is not empty?
Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? What is your goal for that? Have you measured the performance difference and decided *based on objective observation* that it's too expensive? Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries in a directory, and thereby to test whether it is empty. That simplicity is very valuable, and you should have a compelling, *measured* reason to do something more complicated. What is it? -- \ “The most dangerous man to any government is the man who is | `\ able to think things out for himself, without regard to the | _o__) prevailing superstitions and taboos.” —Henry L. Mencken | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 2014-08-07 08:26, Ben Finney wrote: Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries in a directory, and thereby to test whether it is empty. That simplicity is very valuable, and you should have a compelling, *measured* reason to do something more complicated. What is it? With all the changes in 2-3 where many listy things were made into iteratory things (e.g. range()), I was surprised that os.listdir() didn't do likewise since I believe that just about every OS uses some iterator-like call behind the scenes anyways. The difference in timings when serving a web-request are noticeable (in my use-case, I had to change my algorithm and storage structure to simplify/avoid heavily-populated directories) -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
On 8/6/2014 6:44 PM, Tim Chase wrote: On 2014-08-07 08:26, Ben Finney wrote: Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries in a directory, and thereby to test whether it is empty. That simplicity is very valuable, and you should have a compelling, *measured* reason to do something more complicated. What is it? With all the changes in 2-3 where many listy things were made into iteratory things (e.g. range()), I was surprised that os.listdir() didn't do likewise since I believe that just about every OS uses some iterator-like call behind the scenes anyways. I expect 3.5 will have a scandir generator function. The difference in timings when serving a web-request are noticeable (in my use-case, I had to change my algorithm and storage structure to simplify/avoid heavily-populated directories) -tkc -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
Tim Chase python.l...@tim.thechases.com writes: The difference in timings when serving a web-request are noticeable (in my use-case, I had to change my algorithm and storage structure to simplify/avoid heavily-populated directories) So, if the requirement is “test whether the directory is empty faster than N microseconds”, that's quite different from “without the generation of a list of the file names”. The former may entail the latter, but that's not to be assumed, and chasing an optimisation prematurely is a common cause of terrible code. Therefore, I'm asking the OP what is their (so far unstated) reason for caring about the implementation of a standard library call. Without that, it would be folly to try to suggest a solution. With that, it may turn out the stated requirement isn't relevant for satisfying the actual requirement. I don't know (and it's possible the OP doesn't know) the relevance of the “create a list of entries” part, so I asked. -- \ “Science is a way of trying not to fool yourself. The first | `\ principle is that you must not fool yourself, and you are the | _o__) easiest person to fool.” —Richard P. Feynman, 1964 | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Test for an empty directory that could be very large if it is not empty?
Ben Finney wrote: Virgil Stokes v...@it.uu.se writes: Suppose I have a directory C:/Test that is either empty or contains more than 200 files, all with the same extension (e.g. *.txt). How can I determine if the directory is empty WITHOUT the generation of a list of the file names in it (e.g. using os.listdir('C:/Test')) when it is not empty? What is your goal for that? Have you measured the performance difference and decided *based on objective observation* that it's too expensive? Normally I would agree with you, but this is one case where there is no need to measure, we can tell in advance that at least sometimes there will be a severe performance hit simply by considering the nature of file systems. In particular, consider the case where the directory is a remote file system on the other side of the world over a link with many dropped packets or other noise. Waiting for 200 thousand file names to be transmitted, only to throw them away, is surely going to be slower than (say) the results of a call to os.stat. (Assuming that gives the answer.) The difficult question then becomes: is it reasonable to (potentially) slow down the common case of local file systems by a tiny amount, in order to protect against the (rare) case where it will give a big speed things up? -- Steven -- https://mail.python.org/mailman/listinfo/python-list