subject:"\[Python\-Dev\] Updates to PEP 471, the os.scandir\(\) proposal"

 In this case because the names are exactly the same as the os versions which
 /do/ make a system call.

Fair enough.

 So if I'm finally understanding the root problem here:

   - listdir returns a list of strings, one for each filename and one for
 each directory, and keeps no other O/S supplied info.

   - os.walk, which uses listdir, then needs to go back to the O/S and
 refetch the thrown-away information

   - so it's slow.
 ...
 and the new problem:

   - not all O/Ses provide the same (or any) extra info about the
 directory entries

 Have I got that right?

Yes, that's exactly right.

 If so, I still like the attribute idea better (surprise!), we just need to
 revisit the 'ensure_lstat' (or whatever it's called) parameter:  instead of
 a true/false value, it could have a scale:

   - 0 = whatever the O/S gives us

   - 1 = at least the is_dir/is_file (whatever the other normal one is),
 and if the O/S doesn't give it to us for free than call lstat

   - 2 = we want it all -- call lstat if necessary on this platform

 After all, the programmer should know up front how much of the extra info
 will be needed for the work that is trying to be done.

Yeah, I think this is a good idea to make option #2 a bit nicer. I
don't like the magic constants, and using constants like
os.SCANDIR_LSTAT is annoying, so how about using strings? I also
suggest calling the parameter info (because it determines what info
is returned), so you'd do scandir(path, info='type') if you need just
the is_X type information.

I also think it's nice to have a way for power users to just return
what the OS gives us. However, I think making this the default is a
bad idea, as it's just asking for cross-platform bugs (and it's easy
to prevent).

Paul Moore basically agrees with this in his reply yesterday, though I
disagree with him it would be unfriendly to fail hard unless you asked
for the info -- quite the opposite, Linux users would think it very
unfriendly when your code broke because you didn't ask for the info.
:-)

So how about tweaking option #2 a tiny bit more to this:

def scandir(path='.', info=None, onerror=None): ...

* if info is None (the default), only the .name and .full_name
attributes are present
* if info is 'type', scandir ensures the is_dir/is_file/is_symlink
attributes are present and either True or False
* if info is 'lstat', scandir additionally ensures a .lstat is present
and is a full stat_result object
* if info is 'os', scandir returns the attributes the OS provides
(everything on Windows, only is_X -- most of the time -- on POSIX)

* if onerror is not None and errors occur during any internal lstat()
call, onerror(exc) is called with the OSError exception object

Further point -- because the is_dir/is_file/is_symlink attributes are
booleans, it would be very bad for them to be present but None if you
didn't ask for (or the OS didn't return) the type information. Because
then if entry.is_dir: would be None and your code would think it
wasn't a directory, when actually you don't know. For this reason, all
attributes should fail with AttributeError if not fetched.

 Thank you for writing scandir, and this PEP.  Excellent work.

Thanks!

 Oh, and +1 for option 2, slightly modified.  :)

With the above tweaks, I'm getting closer to being 50/50. It's
probably 60% #1 and 40% #2 for me now. :-)

Okay folks -- please respond: option #1 as per the current PEP 471, or
option #2 with Ethan's multi-level thing tweaks as per the above?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-08 22:09 GMT+02:00 Ben Hoyt benh...@gmail.com:
 I think you're misunderstanding is_dir() and is_file(), as these don't
 actually call os.stat(). All DirEntry methods either call nothing or
 os.lstat() to get the stat info on the entry itself (not the
 destination of the symlink).


 Oh. Extract of your PEP: is_dir(): like os.path.isdir(), but much cheaper.

 genericpath.isdir() and genericpath.isfile() use os.stat(), whereas
 posixpath.islink() uses os.lstat().

 Is it a mistake in the PEP?

 Ah, you're dead right -- this is basically a bug in the PEP, as
 DirEntry.is_dir() is not like os.path.isdir() in that it is based on
 the entry itself (like lstat), not following the link.

 I'll improve the wording here and update the PEP.

Ok, so it means that your example grouping files per type, files and
directories, is also wrong. Or at least, it behaves differently than
os.walk(). You should put symbolic links to directories in the dirs
list too.

if entry.is_dir():   # is_dir() checks os.lstat()
dirs.append(entry)
elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks os.stat()
dirs.append(entry)
else:
non_dirs.append(entry)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 Ok, so it means that your example grouping files per type, files and
 directories, is also wrong. Or at least, it behaves differently than
 os.walk(). You should put symbolic links to directories in the dirs
 list too.

 if entry.is_dir():   # is_dir() checks os.lstat()
 dirs.append(entry)
 elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks os.stat()
 dirs.append(entry)
 else:
 non_dirs.append(entry)

Yes, good call. I believe I'm doing this wrong in the scandir.py
os.walk() implementation too -- hence this open issue:
https://github.com/benhoyt/scandir/issues/4

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 13:48, Ben Hoyt benh...@gmail.com wrote:
 Okay folks -- please respond: option #1 as per the current PEP 471, or
 option #2 with Ethan's multi-level thing tweaks as per the above?

I'm probably about 50/50 at the moment. What will swing it for me is
likely error handling, so let's try both approaches with some error
handling:

Rules are that we calculate the total size of all files in a tree (as
returned from lstat), with files that fail to stat being logged and
their size assumed to be 0.

Option 1:

def get_tree_size(path):
total = 0
for entry in os.scandir(path):
try:
isdir = entry.is_dir()
except OSError:
logger.warn(Cannot stat {}.format(entry.full_name))
continue
if entry.is_dir():
total += get_tree_size(entry.full_name)
else:
try:
total += entry.lstat().st_size
except OSError:
logger.warn(Cannot stat {}.format(entry.full_name))
return total

Option 2:
def log_err(exc):
logger.warn(Cannot stat {}.format(exc.filename))

def get_tree_size(path):
total = 0
for entry in os.scandir(path, info='lstat', onerror=log_err):
if entry.is_dir:
total += get_tree_size(entry.full_name)
else:
total += entry.lstat.st_size
return total

On this basis, #2 wins. However, I'm slightly uncomfortable using the
filename attribute of the exception in the logging, as there is
nothing in the docs saying that this will give a full pathname. I'd
hate to see Unable to stat __init__.py!!!

So maybe the onerror function should also receive the DirEntry object
- which will only have the name and full_name attributes, but that's
all that is needed.

OK, looks like option #2 is now my preferred option. My gut instinct
still rebels over an API that deliberately throws information away in
the default case, even though there is now an option to ask it to keep
that information, but I see the logic and can learn to live with it.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 Option 2:
 def log_err(exc):
 logger.warn(Cannot stat {}.format(exc.filename))

 def get_tree_size(path):
 total = 0
 for entry in os.scandir(path, info='lstat', onerror=log_err):
 if entry.is_dir:
 total += get_tree_size(entry.full_name)
 else:
 total += entry.lstat.st_size
 return total

 On this basis, #2 wins.

That's a pretty nice comparison, and you're right, onerror handling is
nicer here.

 However, I'm slightly uncomfortable using the
 filename attribute of the exception in the logging, as there is
 nothing in the docs saying that this will give a full pathname. I'd
 hate to see Unable to stat __init__.py!!!

Huh, you're right. I think this should be documented in os.walk() too.
I think it should be the full filename (is it currently?).

 So maybe the onerror function should also receive the DirEntry object
 - which will only have the name and full_name attributes, but that's
 all that is needed.

That's an interesting idea -- though enough of a deviation from
os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just
document that the onerror exception .filename is the full path name.

One issue with option #2 that I just realized -- does scandir yield
the entry at all if there's a stat error? It can't really, because the
caller will except the .lstat attribute to be set (assuming he asked
for type='lstat') but it won't be. Is effectively removing these
entries just because the stat failed a problem? I kind of think it is.
If so, is there a way to solve it with option #2?

 OK, looks like option #2 is now my preferred option. My gut instinct
 still rebels over an API that deliberately throws information away in
 the default case, even though there is now an option to ask it to keep
 that information, but I see the logic and can learn to live with it.

In terms of throwing away info in the default case -- it's simply a
case of getting what you ask for. :-) Worst case, you'll write your
code and test it, it'll fail hard on any system, you'll fix it
immediately, and then it'll work on any system.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 14:22, Ben Hoyt benh...@gmail.com wrote:
 So maybe the onerror function should also receive the DirEntry object
 - which will only have the name and full_name attributes, but that's
 all that is needed.

 That's an interesting idea -- though enough of a deviation from
 os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just
 document that the onerror exception .filename is the full path name.

But the onerror exception will come from the lstat call, so it'll be a
raw OSError (unless scandir modifies it, which may be what you're
thinking of). And if so, aren't we at the mercy of what the OS module
gives us? That's why I said we can't guarantee it. I looked at the
documentation of OSError (in Built In Exceptions), and all it says
is the filename (unqualified). I'd expect that to be whatever got
passed to the underlying OS API - which may well be an absolute
pathname if we're lucky, but who knows? (I'd actually prefer it if
OSError guaranteed a full pathname, but that's a bigger issue...)

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 05:48 AM, Ben Hoyt wrote:


So how about tweaking option #2 a tiny bit more to this:

def scandir(path='.', info=None, onerror=None): ...

* if info is None (the default), only the .name and .full_name
attributes are present
* if info is 'type', scandir ensures the is_dir/is_file/is_symlink
attributes are present and either True or False
* if info is 'lstat', scandir additionally ensures a .lstat is present
and is a full stat_result object
* if info is 'os', scandir returns the attributes the OS provides
(everything on Windows, only is_X -- most of the time -- on POSIX)


I would rather have the default for info be 'os': cross-platform is good, but there is no reason to force it on some 
poor script that is meant to run on a local machine and will never leave it.




* if onerror is not None and errors occur during any internal lstat()
call, onerror(exc) is called with the OSError exception object


As Paul mentioned, 'onerror(exc, DirEntry)' would be better.



Further point -- because the is_dir/is_file/is_symlink attributes are
booleans, it would be very bad for them to be present but None if you
didn't ask for (or the OS didn't return) the type information. Because
then if entry.is_dir: would be None and your code would think it
wasn't a directory, when actually you don't know. For this reason, all
attributes should fail with AttributeError if not fetched.


Fair point, and agreed.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 06:22 AM, Ben Hoyt wrote:


One issue with option #2 that I just realized -- does scandir yield the entry 
at all if there's a stat error? It
can't really, because the caller will expect the .lstat attribute to be set 
(assuming he asked for type='lstat') but
it won't be. Is effectively removing these entries just because the stat failed 
a problem? I kind of think it is. If
so, is there a way to solve it with option #2?


Leave it up to the onerror handler.  If it returns None, skip yielding the 
entry, otherwise yield whatever it returned
-- which also means the error handler should be able to set fields on the 
DirEntry:

  def log_err(exc, entry):
  logger.warn(Cannot stat {}.format(exc.filename))
  entry.lstat.st_size = 0
  return True

  def get_tree_size(path):
  total = 0
  for entry in os.scandir(path, info='lstat', onerror=log_err):
  if entry.is_dir:
  total += get_tree_size(entry.full_name)
  else:
  total += entry.lstat.st_size
  return total

This particular example doesn't benefit much from the addition, but this way we don't have to guess what the programmer 
wants or needs to do in the case of failure.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Walter Dörwald


On 8 Jul 2014, at 15:52, Ben Hoyt wrote:


Hi folks,

After some very good python-dev feedback on my first version of PEP
471, I've updated the PEP to clarify a few things and added various
Rejected ideas subsections. Here's a link to the new version (I've
also copied the full text below):

http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML
http://hg.python.org/peps/rev/0da4736c27e8 -- changes

[...]
Rejected ideas
==

[...]
Return values being pathlib.Path objects


With Antoine Pitrou's new standard library ``pathlib`` module, it
at first seems like a great idea for ``scandir()`` to return instances
of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and
``lstat()`` functions are explicitly not cached, whereas ``scandir``
has to cache them by design, because it's (often) returning values
from the original directory iteration system call.

And if the ``pathlib.Path`` instances returned by ``scandir`` cached
lstat values, but the ordinary ``pathlib.Path`` objects explicitly
don't, that would be more than a little confusing.

Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in
the context of scandir `here
https://mail.python.org/pipermail/python-dev/2013-November/130583.html`_,
making ``pathlib.Path`` objects a bad choice for scandir return
values.


Can we at least make sure that attributes of DirEntry that have the same 
meaning as attributes of pathlib.Path have the same name?



[...]


Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 06:41 AM, Ethan Furman wrote:


Leave it up to the onerror handler.  If it returns None, skip yielding the 
entry, otherwise yield whatever it returned
-- which also means the error handler should be able to set fields on the 
DirEntry:

   def log_err(exc, entry):
   logger.warn(Cannot stat {}.format(exc.filename))
   entry.lstat.st_size = 0
   return True


Blah.  Okay, either return the DirEntry (possibly modified), or have the log_err return entry instead of True.  (Now 
where is that caffeine??)


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 15:12 GMT+02:00 Ben Hoyt benh...@gmail.com:
 Ok, so it means that your example grouping files per type, files and
 directories, is also wrong. Or at least, it behaves differently than
 os.walk(). You should put symbolic links to directories in the dirs
 list too.

 if entry.is_dir():   # is_dir() checks os.lstat()
 dirs.append(entry)
 elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks 
 os.stat()
 dirs.append(entry)
 else:
 non_dirs.append(entry)

 Yes, good call. I believe I'm doing this wrong in the scandir.py
 os.walk() implementation too -- hence this open issue:
 https://github.com/benhoyt/scandir/issues/4

The PEP says that DirEntry should mimic pathlib.Path, so I think that
DirEntry.is_dir() should work as os.path.isir(): if the entry is a
symbolic link, you should follow the symlink to get the status of the
linked file with os.stat().

entry.is_dir() or (entry.is_symlink() and os.path.isdir(entry))
looks wrong: why would you have to check is_dir() and isdir()?
Duplicating this check is error prone and not convinient.

Pseudo-code:
---
class DirEntry:
def __init__(self, lstat=None, d_type=None):
self._stat = None
self._lstat = lstat
self._d_type = d_type

def stat(self):
if self._stat is None:
self._stat = os.stat(self.full_name)
return self._stat

def lstat(self):
if self._lstat is None:
self._lstat = os.lstat(self.full_name)
return self._lstat

def is_dir(self):
if self._d_type is not None:
if self._d_type == DT_DIR:
return True
if self._d_type != DT_LNK:
return False
else:
lstat = self.lstat()
if stat.S_ISDIR(lstat.st_mode):
return True
if not stat.S_ISLNK(lstat.st_mode):
return False
stat = self.stat()
return stat.S_ISDIR(stat.st_mode)
---

DirEntry would be created with lstat (Windows) or d_type (Linux)
prefilled. is_dir() would only need to call os.stat() once for
symbolic links.

With this code, it becomes even more obvious why is_dir() is a method
and not a property ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote:
 The PEP says that DirEntry should mimic pathlib.Path, so I think that
 DirEntry.is_dir() should work as os.path.isir(): if the entry is a
 symbolic link, you should follow the symlink to get the status of the
 linked file with os.stat().

Would this not break the tree size script being discussed in the
other thread, as it would follow links and include linked directories
in the size of the tree?

As a Windows user with only a superficial understanding of how
symlinks should behave, I don't have any intuition as to what the
right answer should be. But I would say that the tree size code
we've been debating over there (which recurses if is_dir is true and
adds in st_size otherwise) should do whatever people would expect of a
function with that name, as it's a perfect example of something a
Windows user might write and expect it to work cross-platform. If it
doesn't much of the worrying over making sure scandir's API is
cross-platform by default is probably being wasted :-)

(Obviously the walk_tree function could be modified if needed, but
that's missing the point I'm trying to make :-))

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 The PEP says that DirEntry should mimic pathlib.Path, so I think that
 DirEntry.is_dir() should work as os.path.isir(): if the entry is a
 symbolic link, you should follow the symlink to get the status of the
 linked file with os.stat().

 Would this not break the tree size script being discussed in the
 other thread, as it would follow links and include linked directories
 in the size of the tree?

Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or
attributes) should mimic the link-following os.path.isdir() at all.
You want the type of the entry, not the type of the source.

Otherwise, as Paul says, you are essentially forced to follow links,
and os.walk(followlinks=False), which is the default, can't do the
right thing.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 One issue with option #2 that I just realized -- does scandir yield the
 entry at all if there's a stat error? It
 can't really, because the caller will expect the .lstat attribute to be
 set (assuming he asked for type='lstat') but

 it won't be. Is effectively removing these entries just because the stat
 failed a problem? I kind of think it is. If
 so, is there a way to solve it with option #2?


 Leave it up to the onerror handler.  If it returns None, skip yielding the
 entry, otherwise yield whatever it returned
 -- which also means the error handler should be able to set fields on the
 DirEntry:

   def log_err(exc, entry):
   logger.warn(Cannot stat {}.format(exc.filename))
   entry.lstat.st_size = 0
   return True

This is an interesting idea, but it's just getting more and more
complex, and I'm guessing that being able to change the attributes of
DirEntry will make the C implementation more complex.

Also, I'm not sure it's very workable. For log_err above, you'd
actually have to do something like this, right?

def log_err(exc, entry):
logger.warn(Cannot stat {}.format(exc.filename))
entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
return entry

Unless there's another simple way around this issue, I'm back to
loving the simplicity of option #1, which avoids this whole question.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 14:22, Ben Hoyt benh...@gmail.com wrote:
 One issue with option #2 that I just realized -- does scandir yield
 the entry at all if there's a stat error? It can't really, because the
 caller will except the .lstat attribute to be set (assuming he asked
 for type='lstat') but it won't be. Is effectively removing these
 entries just because the stat failed a problem? I kind of think it is.
 If so, is there a way to solve it with option #2?

So the issue is that you need to do a stat but it failed. You have
whatever the OS gave you, but can't get anything more. This is only
an issue on POSIX, where the original OS call doesn't give you
everything, so it's fine, those POSIX people can just learn to cope
with their broken OS, right? :-)

More seriously, why not just return a DirEntry that says it's a file
with a stat entry that's all zeroes? That seems pretty harmless. And
the onerror function will be called, so if it is inappropriate the
application can do something. Maybe it's worth letting onerror return
a boolean that says whether to skip the entry, but that's as far as
I'd bother going.

It's a close call, but I think option #2 still wins (just) for me.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 08:35 AM, Ben Hoyt wrote:

One issue with option #2 that I just realized -- does scandir yield the
entry at all if there's a stat error? It
can't really, because the caller will expect the .lstat attribute to be
set (assuming he asked for type='lstat') but

it won't be. Is effectively removing these entries just because the stat
failed a problem? I kind of think it is. If
so, is there a way to solve it with option #2?



Leave it up to the onerror handler.  If it returns None, skip yielding the
entry, otherwise yield whatever it returned
-- which also means the error handler should be able to set fields on the
DirEntry:

   def log_err(exc, entry):
   logger.warn(Cannot stat {}.format(exc.filename))
   entry.lstat.st_size = 0
   return True


This is an interesting idea, but it's just getting more and more
complex, and I'm guessing that being able to change the attributes of
DirEntry will make the C implementation more complex.

Also, I'm not sure it's very workable. For log_err above, you'd
actually have to do something like this, right?

def log_err(exc, entry):
 logger.warn(Cannot stat {}.format(exc.filename))
 entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
 return entry


I would imagine we would provide a helper function:

  def stat_result(st_size=0, st_atime=0, st_mtime=0, ...):
  return os.stat_result((st_size, st_atime, st_mtime, ...))

and then in onerror

  entry.lstat = stat_result()



Unless there's another simple way around this issue, I'm back to
loving the simplicity of option #1, which avoids this whole question.


Too simple is just as bad as too complex, and properly handling errors is rarely a simple task.  Either we provide a 
clean way to deal with errors in the API, or we force every user everywhere to come up with their own system.


Also, just because we provide it doesn't force people to use it, but if we 
don't provide it then people cannot use it.

To summarize the choice I think we are looking at:

  1) We provide a very basic tool that many will have to write wrappers
 around to get the desired behavior (choice 1)

  2) We provide a more advanced tool that, in many cases, can be used
 as-is, and is also fairly easy to extend to handle odd situations
(choice 2)

More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement 
choice 2?  Would I have to write my own CustomDirEntry object?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 17:35, Ethan Furman et...@stoneleaf.us wrote:
 More specifically, if we go with choice 1 (no built-in error handling, no
 mutable DirEntry), how would I implement choice 2?  Would I have to write my
 own CustomDirEntry object?

Having built-in error handling is, I think, a key point. That's where
#1 really falls down.

But a mutable DirEntry and/or letting onerror manipulate the result is
a lot more than just having a hook for being notified of errors. That
seems to me to be a step too far, in the current context.
Specifically, the tree size example doesn't need it.

Do you have a compelling use case that needs a mutable DirEntry? It
feels like YAGNI to me.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 11:04 AM, Paul Moore wrote:

On 9 July 2014 17:35, Ethan Furman et...@stoneleaf.us wrote:

More specifically, if we go with choice 1 (no built-in error handling, no
mutable DirEntry), how would I implement choice 2?  Would I have to write my
own CustomDirEntry object?


Having built-in error handling is, I think, a key point. That's where
#1 really falls down.

But a mutable DirEntry and/or letting onerror manipulate the result is
a lot more than just having a hook for being notified of errors. That
seems to me to be a step too far, in the current context.
Specifically, the tree size example doesn't need it.

Do you have a compelling use case that needs a mutable DirEntry? It
feels like YAGNI to me.


Not at this point.  As I indicated in my reply to your response, as long as we have the onerror machinery now we can 
tweak it later if real-world use shows it would be beneficial.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

This is just getting way too complex ... further thoughts below.

 This is an interesting idea, but it's just getting more and more
 complex, and I'm guessing that being able to change the attributes of
 DirEntry will make the C implementation more complex.

 Also, I'm not sure it's very workable. For log_err above, you'd
 actually have to do something like this, right?

 def log_err(exc, entry):
  logger.warn(Cannot stat {}.format(exc.filename))
  entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
  return entry


 I would imagine we would provide a helper function:

   def stat_result(st_size=0, st_atime=0, st_mtime=0, ...):
   return os.stat_result((st_size, st_atime, st_mtime, ...))

 and then in onerror

   entry.lstat = stat_result()

 Unless there's another simple way around this issue, I'm back to
 loving the simplicity of option #1, which avoids this whole question.


 Too simple is just as bad as too complex, and properly handling errors is
 rarely a simple task.  Either we provide a clean way to deal with errors in
 the API, or we force every user everywhere to come up with their own system.

 Also, just because we provide it doesn't force people to use it, but if we
 don't provide it then people cannot use it.

So here's the ways in which option #2 is now more complicated than option #1:

1) it has an additional info argument, the values of which have to
be documented ('os', 'type', 'lstat', and what each one means)
2) it has an additional onerror argument, the signature of which and
fairly complicated return value is non-obvious and has to be
documented
3) it requires user modification of the DirEntry object, which needs
documentation, and is potentially hard to implement
4) because the DirEntry object now allows modification, you need a
stat_result() helper function to help you build your own stat values

I'm afraid points 3 and 4 here add way too much complexity.

Remind me why all this is better than the PEP 471 approach again? It
handles all of these problems, is very direct, and uses built-in
Python constructs (method calls and try/except error handling).

And it's also simple to document -- much simpler than the above 4
things, which could be a couple of pages in the docs. Here's the doc
required for the PEP 471 approach:

Note about caching and error handling: The is_X() and lstat()
functions may perform an lstat() on first call if the OS didn't
already fetch this data when reading the directory. So if you need
fine-grained error handling, catch OSError exceptions around these
method calls. After the first call, the is_X() and lstat() functions
cache the value on the DirEntry.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 12:03 PM, Ben Hoyt wrote:


So here's the ways in which option #2 is now more complicated than option #1:

1) it has an additional info argument, the values of which have to
be documented ('os', 'type', 'lstat', and what each one means)
2) it has an additional onerror argument, the signature of which and
fairly complicated return value is non-obvious and has to be
documented
3) it requires user modification of the DirEntry object, which needs
documentation, and is potentially hard to implement
4) because the DirEntry object now allows modification, you need a
stat_result() helper function to help you build your own stat values

I'm afraid points 3 and 4 here add way too much complexity.


I'm okay with dropping 3 and 4, and making the return from onerror being simply True to yield the entry, and False/None 
to skip it.  That should make implementation much easier, and documentation not too strenuous either.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 1) it has an additional info argument, the values of which have to
 be documented ('os', 'type', 'lstat', and what each one means)
 2) it has an additional onerror argument, the signature of which and
 fairly complicated return value is non-obvious and has to be
 documented
 3) it requires user modification of the DirEntry object, which needs
 documentation, and is potentially hard to implement
 4) because the DirEntry object now allows modification, you need a
 stat_result() helper function to help you build your own stat values

 I'm afraid points 3 and 4 here add way too much complexity.


 I'm okay with dropping 3 and 4, and making the return from onerror being
 simply True to yield the entry, and False/None to skip it.  That should make
 implementation much easier, and documentation not too strenuous either.

That's definitely better in terms of complexity.

Other python-devers, please chime in with your thoughts or votes.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 21:59 GMT+02:00 Ben Hoyt benh...@gmail.com:
 Other python-devers, please chime in with your thoughts or votes.

Sorry, I didn't follow the whole discussion. IMO DirEntry must use
methods and you should not expose nor document which infos are already
provided by the OS or not. DirEntry should be a best-effort black-box
object providing an API similar to pathlib.Path. is_dir() may be fast?
fine, but don't say it in the documentation because Python must remain
portable and you should not write code specific to one specific
platform.

is_dir(), is_file(), is_symlink() and lstat() can fail as any other
Python function, no need to specialize them with custom error handler.
If you care, just use a very standard try/except.

I'm also against ensure_lstat=True or ideas like that fetching all
datas transparently in the generator. The behaviour would be too
different depending on the OS, and usually you don't need all
informations. And it raises errors in the generator, which is
something unusual, and difficult to handle (I don't like the onerror
callback).

Example where you may sometimes need is_dir(), but not always
---
for entry in os.scandir(path):
  if ignore_entry(entry.name):
 # this entry is not interesting, lstat_result is useless here
 continue
  if entry.is_dir():  # fetch required data if needed
 continue
  ...
---

I don't understand why you are all focused on handling os.stat() and
os.lstat() errors. See for example the os.walk() function which is an
old function (python 2.6!): it doesn't catch erros on isdir(), even if
it has an onerror parameter... It only handles errors on listdir().
IMO errors on os.stat() and os.lstat() are very rare under very
specific conditions. The most common case is that you can get the
status if you can list files.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 9 July 2014 21:24, Victor Stinner victor.stin...@gmail.com wrote:
 Example where you may sometimes need is_dir(), but not always
 ---
 for entry in os.scandir(path):
   if ignore_entry(entry.name):
  # this entry is not interesting, lstat_result is useless here
  continue
   if entry.is_dir():  # fetch required data if needed
  continue
   ...
 ---

That is an extremely good point, and articulates why I've always been
a bit uncomfortable with the whole ensure_stat idea.

 I don't understand why you are all focused on handling os.stat() and
 os.lstat() errors. See for example the os.walk() function which is an
 old function (python 2.6!): it doesn't catch erros on isdir(), even if
 it has an onerror parameter... It only handles errors on listdir().
 IMO errors on os.stat() and os.lstat() are very rare under very
 specific conditions. The most common case is that you can get the
 status if you can list files.

Personally, I'm only focused on it as a response to others feeling
it's important. I'm on Windows, where there are no extra stat calls,
so all *I* care about is having an API that deals with the use cases
others are concerned about without making it too hard for me to use it
on Windows where I don't have to worry about all this.

If POSIX users come to a consensus that error handling doesn't need
special treatment, I'm more than happy to go back to the PEP version.
(Much as previously happened with the race condition debate).

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 01:57 PM, Paul Moore wrote:

On 9 July 2014 21:24, Victor Stinner wrote:


Example where you may sometimes need is_dir(), but not always
---
for entry in os.scandir(path):
   if ignore_entry(entry.name):
  # this entry is not interesting, lstat_result is useless here
  continue
   if entry.is_dir():  # fetch required data if needed
  continue
   ...


That is an extremely good point, and articulates why I've always been
a bit uncomfortable with the whole ensure_stat idea.


On a system which did not supply is_dir automatically I would write that as:

  for entry in os.scandir(path):  # info defaults to 'os', which is basically 
None in this case
  if ignore_entry(entry.name):
  continue
  if os.path.isdir(entry.full_name):
  # do something interesting

Not hard to read or understand, no time wasted in unnecessary lstat calls.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 01:24 PM, Victor Stinner wrote:


Sorry, I didn't follow the whole discussion. IMO DirEntry must use
methods and you should not expose nor document which infos are already
provided by the OS or not. DirEntry should be a best-effort black-box
object providing an API similar to pathlib.Path. is_dir() may be fast?
fine, but don't say it in the documentation because Python must remain
portable and you should not write code specific to one specific
platform.


Okay, so using that logic we should head over to the os module and remove:

ctermid, getenv, getegid, geteuid, getgid, getgrouplist, getgroups, getpgid, getpgrp, getpriority, PRIO_PROCESS, 
PRIO_PGRP, PRIO_USER, getresuid, getresgid, getuid, initgroups, putenv, setegid, seteuid, setgid, setgroups, 
setpriority, setregid, setrusgid, setresuid, setreuid, getsid, setsid, setuid, unsetenv, fchmod, fchown, fdatasync, 
fpathconf, fstatvfs, ftruncate, lockf, F_LOCK, F_TLOCK, F_ULOCK, F_TEST, O_DSYNC, O_RSYNC, O_SYNC, O_NDELAY, O_NONBLOCK, 
O_NOCTTY, O_SHLOCK, O_EXLOCK, O_CLOEXEC, O_BINARY, O_NOINHERIT, O_SHORT_LIVED, O_TEMPORARY, O_RANDOM, O_SEQUENTIAL, 
O_TEXT, ...


Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or 
attributes that are not available on either Windows or Unix or some flavors of Unix.


Oh, and all those upper-case attributes?  Yup, documented.  And when we don't document it ourselves we often refer 
readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms 
-- particularly when calling into the OS.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 On a system which did not supply is_dir automatically I would write that as:

   for entry in os.scandir(path):  # info defaults to 'os', which is
 basically None in this case
   if ignore_entry(entry.name):
   continue
   if os.path.isdir(entry.full_name):
   # do something interesting

 Not hard to read or understand, no time wasted in unnecessary lstat calls.

No, but how do you know whether you're on a system which did not
supply is_dir automatically? The above is not cross-platform, or at
least, not efficient cross-platform, which defeats the whole point of
scandir -- the above is no better than listdir().

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

I really don't understand why you *want* a worse, much less cross-platform API?

 Okay, so using that logic we should head over to the os module and remove:

 ctermid, getenv, getegid...

 Okay, I'm tired of typing, but that list is not even half-way through the os
 page, and those are all methods or attributes that are not available on
 either Windows or Unix or some flavors of Unix.

True, is this really the precedent we want to *aim for*. listdir() is
cross-platform, and it's relatively easy to make scandir()
cross-platform, so why not?

 Oh, and all those upper-case attributes?  Yup, documented.  And when we
 don't document it ourselves we often refer readers to their system
 documentation because Python does not, in fact, return exactly the same
 results on all platforms -- particularly when calling into the OS.

But again, why a worse, less cross-platform API when a simple,
cross-platform one is a method call away?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 22:44 GMT+02:00 Ethan Furman et...@stoneleaf.us:
 On 07/09/2014 01:24 PM, Victor Stinner wrote:
 Sorry, I didn't follow the whole discussion. IMO DirEntry must use
 methods and you should not expose nor document which infos are already
 provided by the OS or not. DirEntry should be a best-effort black-box
 object providing an API similar to pathlib.Path. is_dir() may be fast?
 fine, but don't say it in the documentation because Python must remain
 portable and you should not write code specific to one specific
 platform.


 Okay, so using that logic we should head over to the os module and remove: 
 (...)

My comment was specific to the PEP 471, design of the DirEntry class.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

On 07/09/2014 02:42 PM, Ben Hoyt wrote:

Okay, so using that [no platform specific] logic we should head over to the os
module and remove:

ctermid, getenv, getegid...

Okay, I'm tired of typing, but that list is not even half-way through the os
page, and those are all methods or attributes that are not available on
either Windows or Unix or some flavors of Unix.

True, is this really the precedent we want to *aim for*. listdir() is
cross-platform,

and listdir has serious performance issues, which is why you developed scandir.

Oh, and all those [snipped] upper-case attributes? Yup, documented. And when
we
don't document it ourselves we often refer readers to their system
documentation because Python does not, in fact, return exactly the same
results on all platforms -- particularly when calling into the OS.

But again, why a worse, less cross-platform API when a simple,
cross-platform one is a method call away?

For the same reason we don't use code that makes threaded behavior better, but
kills the single thread application.

If the programmer would rather have consistency on all platforms rather than performance on the one being used,
`info='lstat'` is the option to use.

I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a
couple advantages:

- less duplication of code: in the tree_size example, the error
handling is duplicated twice

- readablity: with the error handling in a separate routine, one
does not have to jump around the try/except blocks looking for
what happens if there are no errors

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 02:38 PM, Victor Stinner wrote:

2014-07-09 22:44 GMT+02:00 Ethan Furman:

On 07/09/2014 01:24 PM, Victor Stinner wrote:


[...] Python must remain
portable and you should not write code specific to one specific
platform.



Okay, so using that logic we should head over to the os module and remove: (...)


My comment was specific to the PEP 471, design of the DirEntry class.


And my comment was to the point of there being methods/attributes/return values that /do/ vary by platform, and /are/ 
documented as such.  Even stat itself is not the same on Windows as posix.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 02:33 PM, Ben Hoyt wrote:


On a system which did not supply is_dir automatically I would write that as:

   for entry in os.scandir(path):
   if ignore_entry(entry.name):
   continue
   if os.path.isdir(entry.full_name):
   # do something interesting

Not hard to read or understand, no time wasted in unnecessary lstat calls.


No, but how do you know whether you're on a system which did not
supply is_dir automatically? The above is not cross-platform, or at
least, not efficient cross-platform, which defeats the whole point of
scandir -- the above is no better than listdir().


Hit a directory with 100,000 entries and you'll change your mind.  ;)

Okay, so the issue is you /want/ to write an efficient, cross-platform 
routine...

hrmmm.

thinking

Okay, marry the two ideas together:

  scandir(path, info=None, onerror=None)
  
  Return a generator that returns one directory entry at a time in a 
DirEntry object
  info:  None -- DirEntries will have whatever attributes the O/S provides
 'type'  -- DirEntries will already have at least the file/dir 
distinction
 'stat'  -- DirEntries will also already have stat information
  

  DirEntry.is_dir()
 Return True if this is a directory-type entry; may call os.lstat if the 
cache is empty.

  DirEntry.is_file()
 Return True if this is a file-type entry; may call os.lstat if the cache 
is empty.

  DirEntry.is_symlink()
 Return True if this is a symbolic link; may call os.lstat if the cache is 
empty.

  DirEntry.stat
 Return the stat info for this link; may call os.lstat if the cache is 
empty.


This way both paradigms are supported.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread MRAB


On 2014-07-09 23:50, Ethan Furman wrote:

On 07/09/2014 02:33 PM, Ben Hoyt wrote:


On a system which did not supply is_dir automatically I would write that as:

   for entry in os.scandir(path):
   if ignore_entry(entry.name):
   continue
   if os.path.isdir(entry.full_name):
   # do something interesting

Not hard to read or understand, no time wasted in unnecessary lstat calls.


No, but how do you know whether you're on a system which did not
supply is_dir automatically? The above is not cross-platform, or at
least, not efficient cross-platform, which defeats the whole point of
scandir -- the above is no better than listdir().


Hit a directory with 100,000 entries and you'll change your mind.  ;)

Okay, so the issue is you /want/ to write an efficient, cross-platform 
routine...

hrmmm.

thinking

Okay, marry the two ideas together:

scandir(path, info=None, onerror=None)

Return a generator that returns one directory entry at a time in a 
DirEntry object


Should that be that yields one directory entry at a time?


info:  None -- DirEntries will have whatever attributes the O/S 
provides
   'type'  -- DirEntries will already have at least the file/dir 
distinction
   'stat'  -- DirEntries will also already have stat information


DirEntry.is_dir()
   Return True if this is a directory-type entry; may call os.lstat if the 
cache is empty.

DirEntry.is_file()
   Return True if this is a file-type entry; may call os.lstat if the cache 
is empty.

DirEntry.is_symlink()
   Return True if this is a symbolic link; may call os.lstat if the cache 
is empty.

DirEntry.stat
   Return the stat info for this link; may call os.lstat if the cache is 
empty.


Why is is_dir, et al, functions, but stat not a function?



This way both paradigms are supported.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 04:22 PM, MRAB wrote:

On 2014-07-09 23:50, Ethan Furman wrote:


Okay, marry the two ideas together:

scandir(path, info=None, onerror=None)

Return a generator that returns one directory entry at a time in a 
DirEntry object


Should that be that yields one directory entry at a time?


Yes, thanks.


info:  None -- DirEntries will have whatever attributes the O/S 
provides
   'type'  -- DirEntries will already have at least the file/dir 
distinction
   'stat'  -- DirEntries will also already have stat information


DirEntry.is_dir()
   Return True if this is a directory-type entry; may call os.lstat if the 
cache is empty.

DirEntry.is_file()
   Return True if this is a file-type entry; may call os.lstat if the cache 
is empty.

DirEntry.is_symlink()
   Return True if this is a symbolic link; may call os.lstat if the cache 
is empty.

DirEntry.stat
   Return the stat info for this link; may call os.lstat if the cache is 
empty.


Why is is_dir, et al, functions, but stat not a function?


Good point.  Make stat a function as well.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 17:29 GMT+02:00 Ben Hoyt benh...@gmail.com:
 Would this not break the tree size script being discussed in the
 other thread, as it would follow links and include linked directories
 in the size of the tree?

The get_tree_size() function in the PEP would use: if not
entry.is_symlink() and entry.is_dir():.

Note: First I wrote if entry.is_dir() and not entry.is_symlink():,
but this syntax is slower on Linux because is_dir() has to call
lstat().

Adding an optional keyword to DirEntry.is_dir() would allow to write
if entry.is_dir(follow_symlink=False), but it looks like a micro
optimization and as I said, I prefer to stick to pathlib.Path API
(which was already heavily discussed in its PEP). Anyway, this case is
rare (I explain that below), we should not worry too much about it.

 Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or
 attributes) should mimic the link-following os.path.isdir() at all.
 You want the type of the entry, not the type of the source.

On UNIX, a symlink to a directory is expected to behave like a
directory. For example, in a file browser, you should enter in the
linked directory when you click on a symlink to a directory.

There are only a few cases where you want to handle symlinks
differently: archive (ex: tar), compute the size of a directory (ex:
du does not follow symlinks by default, du -L follows them), remove a
directory.

You should do a short poll in the Python stdlib and on the Internet to
check what is the most common check.

Examples of the Python stdlib:

- zipfile: listdir + os.path.isdir
- pkgutil: listdir + os.path.isdir
- unittest.loader: listdir + os.path.isdir and os.path.isfile
- http.server: listdir + os.path.isdir, it also uses os.path.islink: 
Append / for directories or @ for symbolic links 
- idlelib.GrepDialog: listdir + os.path.isdir
- compileall: listdir + os.path.isdir and os.path.isdir(fullname) and
not os.path.islink(fullname) = don't follow symlinks to directories
- shutil (copytree): listdir + os.path.isdir + os.path.islink
- shutil (rmtree): listdir + os.lstat() + stat.S_ISDIR(mode) = don't
follow symlinks to directories
- mailbox: listdir + os.path.isdir
- tabnanny: listdir + os.path.isdir
- os.walk: listdir + os.path.isdir + os.path.islink = don't follow
symlinks to directories by default, but the behaviour is configurable
... but symlinks to directories are added to the dirs list (not all
symlinks, only symlinks to directories)
- setup.py: listdir + os.path.isfile

In this list of 12 examples, only compileall, shutil.rmtree and
os.walk check if entries are symlinks. compileall starts by checking
if not os.path.isdir(fullname): which follows symlinks. os.walk()
starts by checking if os.path.isdir(name): which follows symlinks. I
consider that only one case on 12 (8.3%) doesn't follow symlinks.

If entry.is_dir() doesn't follow symlinks, the other 91.7% will need
to be modified to use if entry.is_dir() or (entry.is_link() and
os.path.is_dir(entry.full_name)): to keep the same behaviour :-(

 Otherwise, as Paul says, you are essentially forced to follow links,
 and os.walk(followlinks=False), which is the default, can't do the
 right thing.

os.walk() and get_tree_size() are good users of scandir(), but they
are recursive functions. It means that you may handle symlinks
differently, os.walk() gives the choice to follow or not symlinks for
example.

Recursive functions are rare. The most common case is to list files of
a single directory and then filter files depending on various filters
(is a file? is a directory? match the file name? ...). In such use
case, you don't care of symlinks (you want to follow them).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 17:26 GMT+02:00 Paul Moore p.f.mo...@gmail.com:
 On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote:
 The PEP says that DirEntry should mimic pathlib.Path, so I think that
 DirEntry.is_dir() should work as os.path.isir(): if the entry is a
 symbolic link, you should follow the symlink to get the status of the
 linked file with os.stat().

 (...)
 As a Windows user with only a superficial understanding of how
 symlinks should behave, (...)

FYI Windows also supports symbolic links since Windows Vista. The
feature is unknown because it is restricted to the administrator
account. Try the mklink command in a terminal (cmd.exe) ;-)
http://en.wikipedia.org/wiki/NTFS_symbolic_link

... To be honest, I never created a symlink on Windows. But since it
is supported, you need to know it to write correctly your Windows
code.

(It's unrelated to LNK files.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Nikolaus Rath

Ben Hoyt benh...@gmail.com writes:
 So here's the ways in which option #2 is now more complicated than option #1:

 1) it has an additional info argument, the values of which have to
 be documented ('os', 'type', 'lstat', and what each one means)
 2) it has an additional onerror argument, the signature of which and
 fairly complicated return value is non-obvious and has to be
 documented
 3) it requires user modification of the DirEntry object, which needs
 documentation, and is potentially hard to implement
 4) because the DirEntry object now allows modification, you need a
 stat_result() helper function to help you build your own stat values

 I'm afraid points 3 and 4 here add way too much complexity.

Points 3 and 4 are not required to go with option #2, option #2 merely
allows to implement points 3 and 4 at some point in the future if it
turns out to be desirable.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal


On 07/09/2014 05:15 PM, Victor Stinner wrote:

2014-07-09 17:29 GMT+02:00 Ben Hoyt benh...@gmail.com:

Would this not break the tree size script being discussed in the
other thread, as it would follow links and include linked directories
in the size of the tree?


The get_tree_size() function in the PEP would use: if not
entry.is_symlink() and entry.is_dir():.

Note: First I wrote if entry.is_dir() and not entry.is_symlink():,
but this syntax is slower on Linux because is_dir() has to call
lstat().


Wouldn't it only have to call lstat if the entry was, in fact, a link?



There are only a few cases where you want to handle symlinks
differently: archive (ex: tar), compute the size of a directory (ex:
du does not follow symlinks by default, du -L follows them), remove a
directory.


I agree with Victor here.  If the entry is a link I would want to know if it was a link to a directory or a link to a 
file.  If I care about not following sym links I can check is_symlink() (or whatever it's called).


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

Oh, since I'm proposing to add a new stat() method to DirEntry, we can
optimize it. stat() can reuse lstat() result if the file is not a
symlink. It simplifies is_dir().

New pseudo-code:
---
class DirEntry:
def __init__(self, path, name, lstat=None, d_type=None):
self.name = name
self.full_name = os.path.join(path, name)
# lstat is known on Windows
self._lstat = lstat
if lstat is not None and not stat.S_ISLNK(lstat.st_mode):
# On Windows, stat() only calls os.stat() for symlinks
self._stat = lstat
else:
self._stat = None
# d_type is known on UNIX
if d_type != DT_UNKNOWN:
self._d_type = d_type
else:
   # DT_UNKNOWN is not a very useful information :-p
   self._d_type = None

def stat(self):
if self._stat is None:
self._stat = os.stat(self.full_name)
return self._stat

def lstat(self):
if self._lstat is None:
self._lstat = os.lstat(self.full_name)
if self._stat is None and not stat.S_ISLNK(self._lstat.st_mode):
self._stat = lstat
return self._lstat

def is_dir(self):
if self._d_type is not None:
if self._d_type == DT_DIR:
return True
if self._d_type != DT_LNK:
return False
else:
lstat = self.lstat()
if stat.S_ISDIR(lstat.st_mode):
return True
stat = self.stat()   # if lstat() was already called, stat()
will only call os.stat() for symlink
return stat.S_ISDIR(stat.st_mode)
---

The extra caching rules are complex, that's why I suggest to not document them.

In short: is_dir() only needs an extra syscall for symlinks, for other
file types it does not need any syscall.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Tim Delaney

On 10 July 2014 10:23, Victor Stinner victor.stin...@gmail.com wrote:

 2014-07-09 17:26 GMT+02:00 Paul Moore p.f.mo...@gmail.com:
  On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote:
  The PEP says that DirEntry should mimic pathlib.Path, so I think that
  DirEntry.is_dir() should work as os.path.isir(): if the entry is a
  symbolic link, you should follow the symlink to get the status of the
  linked file with os.stat().
 
  (...)
  As a Windows user with only a superficial understanding of how
  symlinks should behave, (...)

 FYI Windows also supports symbolic links since Windows Vista. The
 feature is unknown because it is restricted to the administrator
 account. Try the mklink command in a terminal (cmd.exe) ;-)
 http://en.wikipedia.org/wiki/NTFS_symbolic_link

 ... To be honest, I never created a symlink on Windows. But since it
 is supported, you need to know it to write correctly your Windows
 code.


Personally, I create them all the time on Windows - mainly via  the Link
Shell Extension 
http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html. It's
the easiest way to ensure that my directory structures are as I want them
whilst not worrying about where the files really are e.g. code on SSD,
GB+-sized data files on rusty metal, symlinks makes it look like it's the
same directory structure. Same thing can be done with junctions if you're
only dealing with directories, but symlinks work with files as well.

I work cross-platform, and have a mild preference for option #2 with
similar semantics on all platforms.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Akira Li

Ben Hoyt benh...@gmail.com writes:
...
 ``scandir()`` yields a ``DirEntry`` object for each file and directory
 in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
 pseudo-directories are skipped, and the entries are yielded in
 system-dependent order. Each ``DirEntry`` object has the following
 attributes and methods:

 * ``name``: the entry's filename, relative to the ``path`` argument
   (corresponds to the return values of ``os.listdir``)

 * ``full_name``: the entry's full path name -- the equivalent of
   ``os.path.join(path, entry.name)``

I suggest renaming .full_name - .path

.full_name might be misleading e.g., it implies that .full_name ==
abspath(.full_name) that might be false. The .path name has no such
associations.

The semantics of the the .path attribute is defined by these assertions::

for entry in os.scandir(topdir):
#NOTE: assume os.path.normpath(topdir) is not called to create .path
assert entry.path == os.path.join(topdir, entry.name)
assert entry.name == os.path.basename(entry.path)
assert entry.name == os.path.relpath(entry.path, start=topdir)
assert os.path.dirname(entry.path) == topdir
assert (entry.path != os.path.abspath(entry.path) or
os.path.isabs(topdir)) # it is absolute only if topdir is
assert (entry.path != os.path.realpath(entry.path) or
topdir == os.path.realpath(topdir)) # symlinks are not resolved
assert (entry.path != os.path.normcase(entry.path) or
topdir == os.path.normcase(topdir)) # no case-folding,
# unlike PureWindowsPath


...
 * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
   requires a system call on Windows, and usually doesn't on POSIX
   systems

I suggest documenting the implicit follow_symlinks parameter for .is_X methods.

Note: lstat == partial(stat, follow_symlinks=False).

In particular, .is_dir() should probably use follow_symlinks=True by
default as suggested by Victor Stinner *if .is_dir() does it on Windows*

MSDN says: GetFileAttributes() does not follow symlinks.

os.path.isdir docs imply follow_symlinks=True: both islink() and
isdir() can be true for the same path.


...
 Like the other functions in the ``os`` module, ``scandir()`` accepts
 either a bytes or str object for the ``path`` parameter, and returns
 the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
 same type as ``path``. However, it is *strongly recommended* to use
 the str type, as this ensures cross-platform support for Unicode
 filenames.

Document when {e.name for e in os.scandir(path)} != set(os.listdir(path))
+

e.g., path can be an open file descriptor in os.listdir(path) since
Python 3.3 but the PEP doesn't mention it explicitly.

It has been discussed already e.g.,
https://mail.python.org/pipermail/python-dev/2014-July/135296.html

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path (.full_name) attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ).

Reject explicitly in PEP 471 the support for dir_fd parameter
+

aka the support for paths relative to directory descriptors.

Note: it is a *different* (but related) issue.


...
 Notes on exception handling
 ---

 ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
 rather than attributes or properties, to make it clear that they may
 not be cheap operations, and they may do a system call. As a result,
 these methods may raise ``OSError``.

 For example, ``DirEntry.lstat()`` will always make a system call on
 POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
 ``stat()`` system call on such systems if ``readdir()`` returns a
 ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
 certain conditions or on certain file systems.

 For this reason, when a user requires fine-grained error handling,
 it's good to catch ``OSError`` around these method calls and then
 handle as appropriate.


I suggest documenting that next(os.scandir()) may raise OSError

e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir

Also, document whether os.scandir() itself may raise OSError (whether
opendir or other OS functions may be called before the first yield).


...
os.scandir() should allow the explicit cleanup
++

::
with closing(os.scandir()) as entries:
for _ in entries:
break

entries.close() is called that frees the resources if necessary, to
*avoid relying on garbage-collection for managing file

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Nick Coghlan

On 9 Jul 2014 17:14, Ethan Furman et...@stoneleaf.us wrote:

 On 07/09/2014 02:42 PM, Ben Hoyt wrote:


 Okay, so using that [no platform specific] logic we should head over to
the os module and remove:


 ctermid, getenv, getegid...

 Okay, I'm tired of typing, but that list is not even half-way through
the os
 page, and those are all methods or attributes that are not available on
 either Windows or Unix or some flavors of Unix.


 True, is this really the precedent we want to *aim for*. listdir() is
 cross-platform,


 and listdir has serious performance issues, which is why you developed
scandir.

 Oh, and all those [snipped] upper-case attributes?  Yup, documented.
 And when we

 don't document it ourselves we often refer readers to their system
 documentation because Python does not, in fact, return exactly the same
 results on all platforms -- particularly when calling into the OS.


 But again, why a worse, less cross-platform API when a simple,
 cross-platform one is a method call away?


 For the same reason we don't use code that makes threaded behavior
better, but kills the single thread application.

 If the programmer would rather have consistency on all platforms rather
than performance on the one being used, `info='lstat'` is the option to use.

 I like the 'onerror' API better primarily because it gives a single point
to deal with the errors.  This has at least a couple advantages:

   - less duplication of code: in the tree_size example, the error
 handling is duplicated twice

   - readablity: with the error handling in a separate routine, one
 does not have to jump around the try/except blocks looking for
 what happens if there are no errors

The onerror approach can also deal with readdir failing, which the PEP
currently glosses over.

I'm somewhat inclined towards the current approach in the PEP, but I'd like
to see an explanation of two aspects:

1. How a scandir variant with an 'onerror' option could be implemented
given the version in the PEP

2. How the existing scandir module handles the 'onerror' parameter to its
directory walking function

Regards,
Nick.


 --
 ~Ethan~

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-08 Thread Victor Stinner

Hi,

2014-07-08 15:52 GMT+02:00 Ben Hoyt benh...@gmail.com:
 After some very good python-dev feedback on my first version of PEP
 471, I've updated the PEP to clarify a few things and added various
 Rejected ideas subsections. Here's a link to the new version (I've
 also copied the full text below):

Thanks, the new PEP looks better.

 * Removed the open issues section, as the three open issues have
 either been included (full_name) or rejected (windows_wildcard)

I remember a pending question on python-dev:

- Martin von Loewis asked if the scandir generator would have send()
and close() methods as any Python generator. I didn't see a reply on
the mailing (nor in the PEP).

 One known error in the PEP is that the Notes sections should be
 top-level sections, not be subheadings of Examples. If someone would
 like to give me (benhoyt) commit access to the peps repo, I can fix
 this and any other issues that come up.

Or just send me your new PEP ;-)

 Notes on caching
 

 The ``DirEntry`` objects are relatively dumb -- the ``name`` and
 ``full_name`` attributes are obviously always cached, and the ``is_X``
 and ``lstat`` methods cache their values (immediately on Windows via
 ``FindNextFile``, and on first use on POSIX systems via a ``stat``
 call) and never refetch from the system.

It is not clear to me which methods share the cache.

On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and
is_symlink() call os.lstat().

If os.stat() says that the file is not a symlink, I guess that you can
use os.stat() result for lstat() and is_symlink() methods?

In the worst case, if the path is a symlink, would it be possible that
os.stat() and os.lstat() become inconsistent if the symlink is
modified between the two calls? If yes, I don't think that it's an
issue, it's just good to know it.

For symlinks, readdir() returns the status of the linked file or of the symlink?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-08 Thread Janzert


On 7/8/2014 9:52 AM, Ben Hoyt wrote:

DirEntry fields being static attribute-only objects
-

In `this July 2014 python-dev message
https://mail.python.org/pipermail/python-dev/2014-July/135303.html`_,
Paul Moore suggested a solution that was a thin wrapper round the OS
feature, where the ``DirEntry`` object had only static attributes:
``name``, ``full_name``, and ``is_X``, with the ``st_X`` attributes
only present on Windows. The idea was to use this simpler, lower-level
function as a building block for higher-level functions.

At first there was general agreement that simplifying in this way was
a good thing. However, there were two problems with this approach.
First, the assumption is the ``is_dir`` and similar attributes are
always present on POSIX, which isn't the case (if ``d_type`` is not
present or is ``DT_UNKNOWN``). Second, it's a much harder-to-use API
in practice, as even the ``is_dir`` attributes aren't always present
on POSIX, and would need to be tested with ``hasattr()`` and then
``os.stat()`` called if they weren't present.



Only exposing what the OS provides for free will make the API too 
difficult to use in the common case. But is there a nice way to expand 
the API that will allow the user who is trying to avoid extra expense 
know what information is already available?


Even if the initial version doesn't have a way to check what information 
is there for free, ensuring there is a clean way to add this in the 
future would be really nice.


Janzert

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 I remember a pending question on python-dev:

 - Martin von Loewis asked if the scandir generator would have send()
 and close() methods as any Python generator. I didn't see a reply on
 the mailing (nor in the PEP).

Good call. Looks like you're referring to this message:
https://mail.python.org/pipermail/python-dev/2014-July/135324.html

I'm not actually familiar with the purpose of .close() and
.send()/.throw() on generators. Do you typically call these functions
manually, or are they called automatically by the generator protocol?

 It is not clear to me which methods share the cache.

 On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and
 is_symlink() call os.lstat().

 If os.stat() says that the file is not a symlink, I guess that you can
 use os.stat() result for lstat() and is_symlink() methods?

 In the worst case, if the path is a symlink, would it be possible that
 os.stat() and os.lstat() become inconsistent if the symlink is
 modified between the two calls? If yes, I don't think that it's an
 issue, it's just good to know it.

 For symlinks, readdir() returns the status of the linked file or of the 
 symlink?

I think you're misunderstanding is_dir() and is_file(), as these don't
actually call os.stat(). All DirEntry methods either call nothing or
os.lstat() to get the stat info on the entry itself (not the
destination of the symlink). In light of this, I don't think what
you're describing above is an issue.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 Only exposing what the OS provides for free will make the API too difficult
 to use in the common case. But is there a nice way to expand the API that
 will allow the user who is trying to avoid extra expense know what
 information is already available?

 Even if the initial version doesn't have a way to check what information is
 there for free, ensuring there is a clean way to add this in the future
 would be really nice.

We could easily add .had_type and .had_lstat properties (not sure
on the names), that would be true if the is_X information and lstat
information was fetched, respectively. Basically both would always be
True on Windows, but on POSIX only had_type would be True d_type is
present and != DT_UNKNOWN.

I don't feel this is actually necessary, but it's not hard to add.

Thoughts?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

 I think you're misunderstanding is_dir() and is_file(), as these don't
 actually call os.stat(). All DirEntry methods either call nothing or
 os.lstat() to get the stat info on the entry itself (not the
 destination of the symlink).


 Oh. Extract of your PEP: is_dir(): like os.path.isdir(), but much cheaper.

 genericpath.isdir() and genericpath.isfile() use os.stat(), whereas
 posixpath.islink() uses os.lstat().

 Is it a mistake in the PEP?

Ah, you're dead right -- this is basically a bug in the PEP, as
DirEntry.is_dir() is not like os.path.isdir() in that it is based on
the entry itself (like lstat), not following the link.

I'll improve the wording here and update the PEP.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-08 Thread Ethan Furman


On 07/08/2014 12:34 PM, Ben Hoyt wrote:


Better to just have the attributes be None if they were not fetched.  None
is better than hasattr anyway, at least in the respect of not having to
catch exceptions to function properly.


The thing is, is_dir() and lstat() are not attributes (for a good
reason). Please read the relevant Rejected ideas sections and let us
know what you think. :-)


I did better than that -- I read the whole thing!  ;)

-1 on the PEP's implementation.

Just like an attribute does not imply a system call, having a method named 'is_dir' /does/ imply a system call, and not 
having one can be just as misleading.


If we have this:

size = 0
for entry in scandir('/some/path'):
size += entry.st_size

  - on Windows, this should Just Work (if I have the names correct ;)
  - on Posix, etc., this should fail noisily with either an AttributeError
('entry' has no 'st_size') or a TypeError (cannot add None)

and the solution is equally simple:

for entry in scandir('/some/path', stat=True):

  - if not Windows, perform a stat call at the same time

Now, of course, we might get errors.  I am not a big fan of wrapping everything in try/except, particularly when we 
already have a model to follow -- os.walk:


for entry in scandir('/some/path', stat=True, onerror=record_and_skip):

If we don't care if an error crashes the script, leave off onerror.

If we don't need st_size and friends, leave off stat=True.

If we get better performance on Windows instead of Linux, that's okay.

scandir is going into os because it may not behave the same on every platform.  Heck, even some non-os modules 
(multiprocessing comes to mind) do not behave the same on every platform.


I think caching the attributes for DirEntry is fine, but let's do it as a snapshot of that moment in time, not name now, 
and attributes in 30 minutes when we finally get to you because we had a lot of processing/files ahead of you (you being 
a DirEntry ;) .


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal