On 19Oct2020 1846, Eryk Sun wrote:
On 10/19/20, Steve Dower <steve.do...@python.org> wrote:
On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of mtime
rather than the actual access time.

Correction - os.stat() updates the access time to _now_, while
os.scandir() returns the last access time without updating it.

os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.

I thought I got my self-correction fired off quickly enough to save you from writing this :)

For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Thanks for the detailed reference.

Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.

Resolving the path is the most expensive part, even if the file is not opened (I've been working with the NTFS team on this area, and we've been benchmarking/analysing all of it). There are a few improvements coming across the board, but I'd much rather just emphasise that os.scandir() is as fast as we can manage using cached information (including as cached by the OS). Otherwise we prevent people from using the fastest available option when they can, if they don't need the additional information/accuracy.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MMRMLWGEV2ZGIACXQTSEQC6TPWGL3UZ3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to