Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 6/29/2014 5:28 AM, Nick Coghlan wrote: There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object. +1 to this in particular, but this whole refresh of the semantics sounds better overall. Finally, for the case where someone does want to keep the DirEntry around, a .refresh() API could rerun lstat() and update all the data. And with that (initial data potentially always populated, or None, and an explicit refresh() API), the data could all be returned as properties, implying that they aren't fetching new data themselves, because they wouldn't be. Glenn ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
On Sat, Jun 28, 2014 at 2:51 AM, Victor Stinner wrote: > 2014-06-26 13:04 GMT+02:00 Antoine Pitrou : >> For the same reason, I agree with Victor that we should ditch the >> threading-disabled builds. It's too much of a hassle for no actual, >> practical benefit. People who want a threadless unicodeless Python can >> install Python 1.5.2 for all I care. > > By the way, adding a buildbot for testing Python without thread > support is not enough. The buildbot is currently broken since more > than one month and nobody noticed :-p I've opened http://bugs.python.org/issue21755 to fix the test a couple of weeks ago. --Berker > > http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/ > > Ok, I noticed, but I consider that I spent too much time on this minor > use case. I prefer to leave such task to someone else :-) > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/berker.peksag%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29.06.2014 19:04, Ethan Furman wrote: > On 06/29/2014 04:12 AM, Jonas Wielicki wrote: >> >> If the flag is set to False, all the fields in the DirEntry will be >> None, for consistency, even on Windows. > > -1 > > This consistency is unnecessary. I’m not sure -- similar to the windows_wildcard option this might be a temptation to write platform dependent code, although possibly by accident (i.e. not reading the docs carefully). > > -- > ~Ethan~ > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/j.wielicki%40sotecware.net > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Chris Angelico writes: > On Sat, Jun 28, 2014 at 11:05 PM, Akira Li <4kir4...@gmail.com> wrote: >> Have you considered adding support for paths relative to directory >> descriptors [1] via keyword only dir_fd=None parameter if it may lead to >> more efficient implementations on some platforms? >> >> [1]: https://docs.python.org/3.4/library/os.html#dir-fd > > Potentially more efficient and also potentially safer (see 'man > openat')... but an enhancement that can wait, if necessary. > Introducing the feature later creates unnecessary incompatibilities. Either it should be explicitly rejected in the PEP 471 and something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`). At C level it could be implemented using fdopendir/openat or scandirat. Here's the function description using Argument Clinic DSL: /*[clinic input] os.scandir path : path_t(allow_fd=True, nullable=True) = '.' *path* can be specified as either str or bytes. On some platforms, *path* may also be specified as an open file descriptor; the file descriptor must refer to a directory. If this functionality is unavailable, using it raises NotImplementedError. * dir_fd : dir_fd = None If not None, it should be a file descriptor open to a directory, and *path* should be a relative string; path will then be relative to that directory. if *dir_fd* is unavailable, using it raises NotImplementedError. Yield a DirEntry object for each file and directory in *path*. Just like os.listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. {parameters} It's an error to use *dir_fd* when specifying *path* as an open file descriptor. [clinic start generated code]*/ And corresponding tests (from test_posix:PosixTester), to show the compatibility with os.listdir argument parsing in detail: def test_scandir_default(self): # When scandir is called without argument, # it's the same as scandir(os.curdir). self.assertIn(support.TESTFN, [e.name for e in posix.scandir()]) def _test_scandir(self, curdir): filenames = sorted(e.name for e in posix.scandir(curdir)) self.assertIn(support.TESTFN, filenames) #NOTE: assume listdir, scandir accept the same types on the platform self.assertEqual(sorted(posix.listdir(curdir)), filenames) def test_scandir(self): self._test_scandir(os.curdir) def test_scandir_none(self): # it's the same as scandir(os.curdir). self._test_scandir(None) def test_scandir_bytes(self): # When scandir is called with a bytes object, # the returned entries names are still of type str. # Call `os.fsencode(entry.name)` to get bytes self.assertIn('a', {'a'}) self.assertNotIn(b'a', {'a'}) self._test_scandir(b'.') @unittest.skipUnless(posix.scandir in os.supports_fd, "test needs fd support for posix.scandir()") def test_scandir_fd_minus_one(self): # it's the same as scandir(os.curdir). self._test_scandir(-1) def test_scandir_float(self): # invalid args self.assertRaises(TypeError, posix.scandir, -1.0) @unittest.skipUnless(posix.scandir in os.supports_fd, "test needs fd support for posix.scandir()") def test_scandir_fd(self): fd = posix.open(posix.getcwd(), posix.O_RDONLY) self.addCleanup(posix.close, fd) self._test_scandir(fd) self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) # call 2nd time to test rewind self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) @unittest.skipUnless(posix.scandir in os.supports_dir_fd, "test needs dir_fd support for os.scandir()") def test_scandir_dir_fd(self): relpath = 'relative_path' with support.temp_dir() as parent: fullpath = os.path.join(parent, relpath) with support.temp_dir(path=fullpath): support.create_empty_file(os.path.join(parent, 'a')) support.create_empty_file(os.path.join(fullpath, 'b')) fd = posix.open(parent, posix.O_RDONLY) self.addCleanup(posix.close, fd) self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) # check that fd is still useful self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev U
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 06/29/2014 04:12 AM, Jonas Wielicki wrote: If the flag is set to False, all the fields in the DirEntry will be None, for consistency, even on Windows. -1 This consistency is unnecessary. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 06/29/2014 05:28 AM, Nick Coghlan wrote: So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes. That would make the DirEntry attributes: is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False (I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name) +1 -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29.06.2014 13:08, Nick Coghlan wrote: > On 29 June 2014 20:52, Steven D'Aprano wrote: >> Speaking of caching, is there a way to freshen the cached values? > > Switch to a full Path object instead of relying on the cached DirEntry data. > > This is what makes me wary of including lstat, even though Windows > offers it without the extra stat call. Caching behaviour is *really* > hard to make intuitive, especially when it *sometimes* returns data > that looks fresh (as it on first call on POSIX systems). This bugs me too. An idea I had was adding a keyword argument to scandir which specifies whether stat data should be added to the direntry or not. If the flag is set to True, This would implicitly call lstat on POSIX before returning the DirEntry, and use the available data on Windows. If the flag is set to False, all the fields in the DirEntry will be None, for consistency, even on Windows. This is not optimal in cases where the stat information is needed only for some of the DirEntry objects, but would also reduce the required logic in the DirEntry object. Thoughts? > > Regards, > Nick. > > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 21:45, Paul Moore wrote: > On 29 June 2014 12:08, Nick Coghlan wrote: >> This is what makes me wary of including lstat, even though Windows >> offers it without the extra stat call. Caching behaviour is *really* >> hard to make intuitive, especially when it *sometimes* returns data >> that looks fresh (as it on first call on POSIX systems). > > If it matters that much we *could* simply call it cached_lstat(). It's > ugly, but I really don't like the idea of throwing the information > away - after all, the fact that we currently throw data away is why > there's even a need for scandir. Let's not make the same mistake > again... Future-proofing is the reason DirEntry is a full fledged class in the first place, though. Effectively communicating the behavioural difference between DirEntry and pathlib.Path is the main thing that makes me nervous about adhering too closely to the Path API. To restate the problem and the alternative proposal, these are the DirEntry methods under discussion: is_dir(): like os.path.isdir(), but requires no system calls on at least POSIX and Windows is_file(): like os.path.isfile(), but requires no system calls on at least POSIX and Windows is_symlink(): like os.path.islink(), but requires no system calls on at least POSIX and Windows lstat(): like os.lstat(), but requires no system calls on Windows For the almost-certain-to-be-cached items, the suggestion is to make them properties (or just ordinary attributes): is_dir is_file is_symlink What do with lstat() is currently less clear, since POSIX directory scanning doesn't provide that level of detail by default. The PEP also doesn't currently state whether the is_dir(), is_file() and is_symlink() results would be updated if a call to lstat() produced different answers than the original directory scanning process, which further suggests to me that allowing the stat call to be delayed on POSIX systems is a potentially problematic and inherently confusing design. We would have two options: - update them, meaning calling lstat() may change those results from being a snapshot of the setting at the time the directory was scanned - leave them alone, meaning the DirEntry object and the DirEntry.lstat() result may give different answers Those both sound ugly to me. So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes. That would make the DirEntry attributes: is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False (I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name) What this would allow: - by default, scanning is efficient everywhere, but lstat_result may be None on POSIX systems - if you always need the lstat result, setting "ensure_lstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None Most importantly, *regardless of platform*, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object. There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object. Regards, Nick. P.S. We'd be generating quite a few of these, so we can use __slots__ to keep the memory overhead to a minimum (that's just a general comment - it's really irrelevant to the methods-or-attributes question). -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 12:08, Nick Coghlan wrote: > This is what makes me wary of including lstat, even though Windows > offers it without the extra stat call. Caching behaviour is *really* > hard to make intuitive, especially when it *sometimes* returns data > that looks fresh (as it on first call on POSIX systems). If it matters that much we *could* simply call it cached_lstat(). It's ugly, but I really don't like the idea of throwing the information away - after all, the fact that we currently throw data away is why there's even a need for scandir. Let's not make the same mistake again... Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 29 June 2014 20:52, Steven D'Aprano wrote: > Speaking of caching, is there a way to freshen the cached values? Switch to a full Path object instead of relying on the cached DirEntry data. This is what makes me wary of including lstat, even though Windows offers it without the extra stat call. Caching behaviour is *really* hard to make intuitive, especially when it *sometimes* returns data that looks fresh (as it on first call on POSIX systems). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Sat, Jun 28, 2014 at 03:55:00PM -0400, Ben Hoyt wrote: > Re is_dir etc being properties rather than methods: [...] > The problem with this is that properties "look free", they look just > like attribute access, so you wouldn't normally handle exceptions when > accessing them. But .lstat() and .is_dir() etc may do an OS call, so > if you're needing to be careful with error handling, you may want to > handle errors on them. Hence I think it's best practice to make them > functions(). I think this one could go either way. Methods look like they actually re-test the value each time you call it. I can easily see people not realising that the value is cached and writing code like this toy example: # Detect a file change. t = the_file.lstat().st_mtime while the_file.lstat().st_mtime == t: sleep(0.1) print("Changed!") I know that's not the best way to detect file changes, but I'm sure people will do something like that and not realise that the call to lstat is cached. Personally, I would prefer a property. If I forget to wrap a call in a try...except, it will fail hard and I will get an exception. But with a method call, the failure is silent and I keep getting the cached result. Speaking of caching, is there a way to freshen the cached values? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 28 Jun 2014, at 21:48, Ben Hoyt wrote: [...] Crazy idea: would it be possible to "convert" a DirEntry object to a pathlib.Path object without losing the cache? I guess that pathlib.Path expects a full stat_result object. The main problem is that pathlib.Path objects explicitly don't cache stat info (and Guido doesn't want them to, for good reason I think). There's a thread on python-dev about this earlier. I'll add it to a "Rejected ideas" section. However, it would be bad to have two implementations of the concept of "filename" with different attribute and method names. The best way to ensure compatible APIs would be if one class was derived from the other. [...] Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com