Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Gregory Ewing

Virgil Stokes wrote:
How can I 
determine if the directory is empty WITHOUT the generation of a list of 
the file names


Which platform?

On Windows, I have no idea.

On Unix you can't really do this properly without access
to opendir() and readdir(), which Python doesn't currently
wrap.

Will the empty directories be newly created, or could they
be ones that *used* to contain 20 files that have since
been deleted?

If they're new or nearly new, you could probably tell from
looking at the size reported by stat() on the directory.
The difference between a fresh empty directory and one with
20 files in it should be fairly obvious.

A viable strategy might be: If the directory is very large,
assume it's not empty. If it's smallish, list its contents
to find out for sure.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Cameron Simpson

On 07Aug2014 18:14, Greg Ewing greg.ew...@canterbury.ac.nz wrote:

Virgil Stokes wrote:
How can I determine if the directory is empty WITHOUT the generation 
of a list of the file names


Which platform?

On Windows, I have no idea.

On Unix you can't really do this properly without access
to opendir() and readdir(), which Python doesn't currently
wrap. [...]


On UNIX (the OP seemed to be using Windows, alas), if you are prepared to be 
destructive you can just do an rmdir. It will fail if the directory is not 
empty, and performs ok on the hypothesised remote once-ginormous directory.


The commonest reason for wanting to know if a directory is empty that I can 
imagine is when you want to remove it, and if that applies here it is better to 
just try to remove it and ask for forgiveness later.


Disclaimer: Windows may not offer this handy safety net.

Cheers,
Cameron Simpson c...@zip.com.au

Of course no description of a Ducati engine is complete without mentioning
the sound.  That deep bass exhaust rumble along with the mechanical music of
the desmodromic valves...too bad it isn't available on compact disc.
- Sport Rider, evidently before the release of the Ducati Passions CD
--
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Ethan Furman

On 08/06/2014 03:26 PM, Ben Finney wrote:

Virgil Stokes v...@it.uu.se writes:


Suppose I have a directory C:/Test that is either empty or contains
more than 200 files, all with the same extension (e.g. *.txt). How
can I determine if the directory is empty WITHOUT the generation of a
list of the file names in it (e.g. using os.listdir('C:/Test')) when
it is not empty?


What is your goal for that? Have you measured the performance difference
and decided *based on objective observation* that it's too expensive?

Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries
in a directory, and thereby to test whether it is empty. That simplicity
is very valuable, and you should have a compelling, *measured* reason to
do something more complicated. What is it?


Plenty of people have measured the slowdown of getting a list of files when the directory has thousands upon thousands. 
 It's why the scandir PEP exists at all, and the slowdown is present even on local file systems.  While it may not be 
objective, walking away to get a cup of your favorite beverage, coming back and seeing the operation is still not done, 
is certainly sufficient to realize that the simple, easy way is not going to be sufficient.


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Tim Chase
On 2014-08-07 11:27, Ben Finney wrote:
  The difference in timings when serving a web-request are
  noticeable (in my use-case, I had to change my algorithm and
  storage structure to simplify/avoid heavily-populated
  directories)  
 
 So, if the requirement is “test whether the directory is empty
 faster than N microseconds”, that's quite different from “without
 the generation of a list of the file names”.
 
 The former may entail the latter, but that's not to be assumed, and
 chasing an optimisation prematurely is a common cause of terrible
 code.

I guess my surprise in the 2-3 non-iterator'ization of
os.listdir() is that it's very easy to wrap an iterable in list()
if you want the whole bunch, but it's much harder to get the
performance characteristics of interruptible iteration (e.g. is the
directory empty or look at files until you find one matching
$CRITERIA).  Looking forward to scandir() arriving for just those
reasons.

-tkc
(who sees Ethan Furman's excellent followup-post as I'm about to hit
Send)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Roy Smith
In article c4gjqvf8cm...@mid.individual.net,
 Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 Virgil Stokes wrote:
  How can I 
  determine if the directory is empty WITHOUT the generation of a list of 
  the file names
 
 Which platform?
 
 On Windows, I have no idea.
 
 On Unix you can't really do this properly without access
 to opendir() and readdir(), which Python doesn't currently
 wrap.
 
 Will the empty directories be newly created, or could they
 be ones that *used* to contain 20 files that have since
 been deleted?
 
 If they're new or nearly new, you could probably tell from
 looking at the size reported by stat() on the directory.
 The difference between a fresh empty directory and one with
 20 files in it should be fairly obvious.
 
 A viable strategy might be: If the directory is very large,
 assume it's not empty. If it's smallish, list its contents
 to find out for sure.

I wonder if glob.iglob('*') might help here?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Peter Otten
Roy Smith wrote:

 In article c4gjqvf8cm...@mid.individual.net,
  Gregory Ewing greg.ew...@canterbury.ac.nz wrote:
 
 Virgil Stokes wrote:
  How can I
  determine if the directory is empty WITHOUT the generation of a list of
  the file names
 
 Which platform?
 
 On Windows, I have no idea.
 
 On Unix you can't really do this properly without access
 to opendir() and readdir(), which Python doesn't currently
 wrap.
 
 Will the empty directories be newly created, or could they
 be ones that *used* to contain 20 files that have since
 been deleted?
 
 If they're new or nearly new, you could probably tell from
 looking at the size reported by stat() on the directory.
 The difference between a fresh empty directory and one with
 20 files in it should be fairly obvious.
 
 A viable strategy might be: If the directory is very large,
 assume it's not empty. If it's smallish, list its contents
 to find out for sure.
 
 I wonder if glob.iglob('*') might help here?

No, the glob module uses os.listdir() under the hood. Therefore iglob() is 
lazy for multiple directories only.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Tim Chase
On 2014-08-07 07:54, Roy Smith wrote:
 I wonder if glob.iglob('*') might help here?

My glob.iglob() uses os.listdir() behind the scenes (see glob1() in
glob.py)

-tkc
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Akira Li
Virgil Stokes v...@it.uu.se writes:

 Suppose I have a directory C:/Test that is either empty or contains
 more than 200 files, all with the same extension (e.g. *.txt). How
 can I determine if the directory is empty WITHOUT the generation of a
 list of the file names in it (e.g. using os.listdir('C:/Test')) when
 it is not empty?

  def is_empty_dir(dirpath):
  return next(scandir(dirpath), None) is None

 https://github.com/benhoyt/scandir


 --
 Akira

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Roy Smith
In article mailman.12725.1407413212.18130.python-l...@python.org,
 Tim Chase python.l...@tim.thechases.com wrote:

 On 2014-08-07 07:54, Roy Smith wrote:
  I wonder if glob.iglob('*') might help here?
 
 My glob.iglob() uses os.listdir() behind the scenes (see glob1() in
 glob.py)
 
 -tkc

In which case, the documentation for iglob() is broken.  It says:

Return an iterator which yields the same values as glob() without 
actually storing them all simultaneously.

If it's calling something which does store them all simultaneously, 
that's like contracting with somebody to commit a crime, and then trying 
to claim you're innocent because you didn't commit the crime yourself.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Tim Chase
On 2014-08-07 08:19, Roy Smith wrote:
  My glob.iglob() uses os.listdir() behind the scenes (see glob1()
  in glob.py)
  
  -tkc  
 
 In which case, the documentation for iglob() is broken.  It says:
 
 Return an iterator which yields the same values as glob() without 
 actually storing them all simultaneously.

I'd tend to agree that iglob() is broken and should use the
proposed .scandir() instead for exactly those reasons.
Unfortunately, it seems that it might not get back-ported
until .scandir() hits.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread John Gordon
In mailman.12711.1407363468.18130.python-l...@python.org Virgil Stokes 
v...@it.uu.se writes:

 Suppose I have a directory C:/Test that is either empty or contains more 
 than 200 files, all with the same extension (e.g. *.txt). How can I 
 determine if the directory is empty WITHOUT the generation of a list of 
 the file names in it (e.g. using os.listdir('C:/Test')) when it is not 
 empty?

Is it one directory that is sometimes empty and other times teeming with
files, or is it a series of directories which are created afresh and then
await arrival of the files?

If the latter, you could try looking at the size of the directory entry
itself.  On the system I'm writing from, a freshly-created directory is
4K in size, and will grow in 4K chunks as more and more files are created
within the directory.  However, the directory entry does not shrink when
files are removed.

--
John Gordon Imagine what it must be like for a real medical doctor to
gor...@panix.comwatch 'House', or a real serial killer to watch 'Dexter'.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-07 Thread Roy Smith
In article mailman.12729.1407433146.18130.python-l...@python.org,
 Tim Chase python.l...@tim.thechases.com wrote:

 On 2014-08-07 08:19, Roy Smith wrote:
   My glob.iglob() uses os.listdir() behind the scenes (see glob1()
   in glob.py)
   
   -tkc  
  
  In which case, the documentation for iglob() is broken.  It says:
  
  Return an iterator which yields the same values as glob() without 
  actually storing them all simultaneously.
 
 I'd tend to agree that iglob() is broken and should use the
 proposed .scandir() instead for exactly those reasons.
 Unfortunately, it seems that it might not get back-ported
 until .scandir() hits.
 
 -tkc

I opened a bug against the 2.7 docs:

http://bugs.python.org/issue22167
-- 
https://mail.python.org/mailman/listinfo/python-list


Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Virgil Stokes
Suppose I have a directory C:/Test that is either empty or contains more 
than 200 files, all with the same extension (e.g. *.txt). How can I 
determine if the directory is empty WITHOUT the generation of a list of 
the file names in it (e.g. using os.listdir('C:/Test')) when it is not 
empty?

--
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Ben Finney
Virgil Stokes v...@it.uu.se writes:

 Suppose I have a directory C:/Test that is either empty or contains
 more than 200 files, all with the same extension (e.g. *.txt). How
 can I determine if the directory is empty WITHOUT the generation of a
 list of the file names in it (e.g. using os.listdir('C:/Test')) when
 it is not empty?

What is your goal for that? Have you measured the performance difference
and decided *based on objective observation* that it's too expensive?

Certainly ‘os.listdir(foo)’ is the simplest way to determine the entries
in a directory, and thereby to test whether it is empty. That simplicity
is very valuable, and you should have a compelling, *measured* reason to
do something more complicated. What is it?

-- 
 \ “The most dangerous man to any government is the man who is |
  `\   able to think things out for himself, without regard to the |
_o__)  prevailing superstitions and taboos.” —Henry L. Mencken |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Tim Chase
On 2014-08-07 08:26, Ben Finney wrote:
 Virgil Stokes v...@it.uu.se writes:
  Suppose I have a directory C:/Test that is either empty or
  contains more than 200 files, all with the same extension
  (e.g. *.txt). How can I determine if the directory is empty
  WITHOUT the generation of a list of the file names in it (e.g.
  using os.listdir('C:/Test')) when it is not empty?
 
 Certainly ‘os.listdir(foo)’ is the simplest way to determine the
 entries in a directory, and thereby to test whether it is empty.
 That simplicity is very valuable, and you should have a compelling,
 *measured* reason to do something more complicated. What is it?

With all the changes in 2-3 where many listy things were made into
iteratory things (e.g. range()), I was surprised that os.listdir()
didn't do likewise since I believe that just about every OS uses some
iterator-like call behind the scenes anyways.

The difference in timings when serving a web-request are noticeable
(in my use-case, I had to change my algorithm and storage structure
to simplify/avoid heavily-populated directories)

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Terry Reedy

On 8/6/2014 6:44 PM, Tim Chase wrote:

On 2014-08-07 08:26, Ben Finney wrote:

Virgil Stokes v...@it.uu.se writes:

Suppose I have a directory C:/Test that is either empty or
contains more than 200 files, all with the same extension
(e.g. *.txt). How can I determine if the directory is empty
WITHOUT the generation of a list of the file names in it (e.g.
using os.listdir('C:/Test')) when it is not empty?


Certainly ‘os.listdir(foo)’ is the simplest way to determine the
entries in a directory, and thereby to test whether it is empty.
That simplicity is very valuable, and you should have a compelling,
*measured* reason to do something more complicated. What is it?


With all the changes in 2-3 where many listy things were made into
iteratory things (e.g. range()), I was surprised that os.listdir()
didn't do likewise since I believe that just about every OS uses some
iterator-like call behind the scenes anyways.


I expect 3.5 will have a scandir generator function.


The difference in timings when serving a web-request are noticeable
(in my use-case, I had to change my algorithm and storage structure
to simplify/avoid heavily-populated directories)

-tkc





--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Ben Finney
Tim Chase python.l...@tim.thechases.com writes:

 The difference in timings when serving a web-request are noticeable
 (in my use-case, I had to change my algorithm and storage structure to
 simplify/avoid heavily-populated directories)

So, if the requirement is “test whether the directory is empty faster
than N microseconds”, that's quite different from “without the
generation of a list of the file names”.

The former may entail the latter, but that's not to be assumed, and
chasing an optimisation prematurely is a common cause of terrible code.

Therefore, I'm asking the OP what is their (so far unstated) reason for
caring about the implementation of a standard library call.

Without that, it would be folly to try to suggest a solution. With that,
it may turn out the stated requirement isn't relevant for satisfying the
actual requirement. I don't know (and it's possible the OP doesn't
know) the relevance of the “create a list of entries” part, so I asked.

-- 
 \ “Science is a way of trying not to fool yourself. The first |
  `\ principle is that you must not fool yourself, and you are the |
_o__)   easiest person to fool.” —Richard P. Feynman, 1964 |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Test for an empty directory that could be very large if it is not empty?

2014-08-06 Thread Steven D'Aprano
Ben Finney wrote:

 Virgil Stokes v...@it.uu.se writes:
 
 Suppose I have a directory C:/Test that is either empty or contains
 more than 200 files, all with the same extension (e.g. *.txt). How
 can I determine if the directory is empty WITHOUT the generation of a
 list of the file names in it (e.g. using os.listdir('C:/Test')) when
 it is not empty?
 
 What is your goal for that? Have you measured the performance difference
 and decided *based on objective observation* that it's too expensive?

Normally I would agree with you, but this is one case where there is no need
to measure, we can tell in advance that at least sometimes there will be a
severe performance hit simply by considering the nature of file systems. In
particular, consider the case where the directory is a remote file system
on the other side of the world over a link with many dropped packets or
other noise. Waiting for 200 thousand file names to be transmitted, only to
throw them away, is surely going to be slower than (say) the results of a
call to os.stat. (Assuming that gives the answer.)

The difficult question then becomes: is it reasonable to (potentially) slow
down the common case of local file systems by a tiny amount, in order to
protect against the (rare) case where it will give a big speed things up?



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list