Ben Hoyt <benh...@gmail.com> writes: >> Note: listdir() accepts an integer path (an open file descriptor that >> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., >> *you can't use scandir() to replace listdir() in this case* (as I've >> already mentioned in [1]). See the corresponding tests from [2]. >> >> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html >> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html >> >> From os.listdir() docs [3]: >> >>> This function can also support specifying a file descriptor; the file >>> descriptor must refer to a directory. >> >> [3] https://docs.python.org/3.4/library/os.html#os.listdir >> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 > > Fair point. > > Yes, I hadn't realized listdir supported dir_fd (must have been > looking at 2.x docs), though you've pointed it out at [1] above. and I > guess I wasn't thinking about implementation at the time.
FYI, dir_fd is related but *different*: compare "specifying a file descriptor" [1] vs. "paths relative to directory descriptors" [2]. "NOTE: os.supports_fd and os.supports_dir_fd are different sets." [3]: >>> import os >>> os.listdir in os.supports_fd True >>> os.listdir in os.supports_dir_fd False [1] https://docs.python.org/3/library/os.html#path-fd [2] https://docs.python.org/3/library/os.html#dir-fd [3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html To be clear: *listdir() does not support dir_fd* though it can be emulated using os.open(dir_fd=..). You can safely ignore the rest of the e-mail until you want to implement path-fd [1] support for os.scandir() in several months. Here's code example that demonstrates both path-fd [1] and dir-fd [2]: import contextlib import os with contextlib.ExitStack() as stack: dir_fd = os.open('/etc', os.O_RDONLY) stack.callback(os.close, dir_fd) fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2] stack.callback(os.close, fd) print("\n".join(os.listdir(fd))) # path-fd [1] It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked to refer to another directory after the first os.open('/etc',..) call. See also, os.fwalk(dir_fd=..) [4] [4] https://docs.python.org/3/library/os.html#os.fwalk > However, given that we have to support this for listdir() anyway, I > think it's worth reconsidering whether scandir()'s directory argument > can be an integer FD. What is entry.path in this case? If input directory is a file descriptor (an integer) then os.path.join(directory, entry.name) won't work. "PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 )." [5] [5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html On the other hand os.fwalk() [4] that supports both path-fd [1] and dir-fd [2] could be implemented without entry.path property if os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way to traverse a directory tree without symlink races e.g., [6]: def get_tree_size(directory): """Return total size of files in directory and subdirs.""" return sum(entry.lstat().st_size for root, dirs, files, rootfd in fwalk(directory) for entry in files) [6] http://legacy.python.org/dev/peps/pep-0471/#examples where fwalk() is the exact copy of os.fwalk() except that it uses _fwalk() which is defined in terms of scandir(): import os # adapt os._fwalk() to use scandir() instead of os.listdir() def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks): # Note: This uses O(depth of the directory tree) file descriptors: # if necessary, it can be adapted to only require O(1) FDs, see # http://bugs.python.org/issue13734 entries = scandir(topfd) dirs, nondirs = [], [] for entry in entries: #XXX call onerror on OSError on next() and return? # report symlinks to directories as directories (like os.walk) # but no recursion into symlinked subdirectories unless # follow_symlinks is true # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't # raise on broken links) try: (dirs if entry.is_dir() else nondirs).append(entry) except FileNotFoundError: continue # ignore disappeared files if topdown: yield toppath, dirs, nondirs, topfd for entry in dirs: try: orig_st = entry.stat(follow_symlinks=follow_symlinks) #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?] dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd) except OSError as err: if onerror is not None: onerror(err) return try: if follow_symlinks or os.path.samestat(orig_st, os.stat(dirfd)): dirpath = os.path.join(toppath, entry.name) # entry.path yield from _fwalk(dirfd, dirpath, topdown, onerror, follow_symlinks) finally: close(dirfd) # or use with entry.opendir() as dirfd: ... if not topdown: yield toppath, dirs, nondirs, topfd i.e., if os.scandir() supports specifying file descriptors [1] then it is relatively straightforward to define os.fwalk() in terms of it. Would scandir() provide the same performance benefits as for os.walk()? entry.stat() can be implemented without entry.path when entry._directory (or whatever other DirEntry's attribute that stores the first parameter to os.scandir(fd)) is an open file descriptor that refers to a directory: def stat(self, *, follow_symlinks=True): return os.stat(self.name, #NOTE: ignore caching follow_symlinks=follow_symlinks, dir_fd=self._directory) lstat = lambda self: self.stat(follow_symlinks=False) -- Akira _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com