On 10/26/20, Victor Stinner <vstin...@python.org> wrote:
> Le lun. 19 oct. 2020 à 13:50, Steve Dower <steve.do...@python.org> a écrit
> :
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything.
>
> I agree that this surprising behavior can be documented. Attempting to
> provide accurate access time in os.scandir() is likely to slow-down
> the function which would defeat its whole purpose.

I don't think the access time (st_atime) is a significant concern. I'm
concerned with the reliability of the file size (st_size) and
last-write time (st_mtime) in stat() results. Developers are used to
various filesystem policies on various platforms that limit when the
access time gets updated, if at all. FAT32 filesystems only have an
access date, and the driver in Windows fixes the access time at
midnight. Updating the access time in NTFS and ReFS can be completely
disabled at the system level; otherwise it's updated with a
granularity of one hour if it's only the access time that would be
updated.

The biggest concern for me is NTFS hardlinks, for which the st_size
and st_mtime in the directory entry is unreliable. When a file with
multiple hardlinks is modified, the filesystem only updates the
duplicated information in the directory entry of the opened link.
Because the entry in the directory doesn't include the link count or
even a boolean value to indicate that a file has multiple hardlinks,
if you don't know whether or not there's a possibility of hardlinks,
then os.stat() is required in order to reliably determine st_size and
st_mtime, to the extent that reliably knowing st_mtime is possible.

A general problem that affects even os.stat() is that a modified file
may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel
file object of the particular open. Whether it's immediately noted in
the last-write time of the shared FCB (file control block) is up to
filesystem policy.

Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately
notes the modification time, so the st_mtime value from os.stat() is
current. In prior versions of NTFS, and with other Microsoft
filesystems such as FAT32, the last-write time is only noted when the
file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when
the open is closed.

This means that st_size may change without also changing st_mtime. I'm
using Windows 10 2004 currently, so I can't show an NTFS example, but
the following shows the behavior with FAT32:

    f = open('spam.txt', 'w')
    st1 = os.stat('spam.txt')
    time.sleep(10)
    f.write('spam')
    f.flush()
    st2 = os.stat('spam.txt')

The above write was noted only by setting the FO_FILE_MODIFIED flag on
the kernel file object. (The file object can be inspected with a local
kernel debugger.) The write time wasn't noted in the FCB, i.e.
st_mtime hasn't changed in st2:

    >>> st2.st_size - st1.st_size
    4
    >>> st2.st_mtime - st1.st_mtime
    0.0

The last-write time is noted when FlushFileBuffers (os.fsync) is
called on the open:

    >>> os.fsync(f.fileno())
    >>> st3 = os.stat('spam.txt')
    >>> st3.st_mtime - st1.st_mtime
    10.0

Note also that, with NTFS, to the extent that the FCB metadata is
current, calling os.stat() on a link updates the duplicated
information in the directory entry. So calling os.stat() on a NTFS
file may update the entry that's returned by a subsequent os.scandir()
call.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to