[Python-Dev] Re: os.scandir bug in Windows?

2020-10-28 Thread Eryk Sun
On 10/28/20, Stephen J. Turnbull  wrote:
>
> Note: you can "fix" directory updates by mounting the filesystem r/o.

Mounting the filesystem as readonly is the extreme case. Popular Unix
systems support a "noatime" mount option that disables updating file
access times, unless one of the other timestamps changes. In Windows,
NTFS and ReFS support a system setting (but not per-volume) to disable
updating access times -- "NtfsDisableLastAccessUpdate" and
"RefsDisableLastAccessUpdate".
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E5AWEB3U5ZCQBWABOKAGL6CADRHBLEEP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-28 Thread Stephen J. Turnbull
Greg Ewing writes:

 > Also it's kind of weird that just looking at data on the
 > disk can change something about it.

The "something about it" *did* change.  The world is a dynamic entity,
it does change.  What you think is weird is that the metadata change
is recorded.

Note: you can "fix" directory updates by mounting the filesystem r/o.

 > Sometimes it's an advantage to *not* have quantum computing!

I think effective encryption is a bigger one, myself. ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SZPTA6RMLDY22REF2T3KQ7435JZ7LFET/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Gregory P. Smith
On Mon, Oct 26, 2020, 4:06 PM Chris Angelico  wrote:

> On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing 
> wrote:
> >
> > On 27/10/20 8:24 am, Victor Stinner wrote:
> > > I would
> > > rather want to kill the whole concept of "access" time in operating
> > > systems (or just configure the OS to not update it anymore). I guess
> > > that it's really hard to make it efficient and accurate at the same
> > > time...
> >
> > Also it's kind of weird that just looking at data on the
> > disk can change something about it. Sometimes it's an
> > advantage to *not* have quantum computing!
> >
>
> And yet, it's of incredible value to be able to ask "now, where was
> that file... the one that I was looking at last week, called something
> about calendars, and it had a cat picture in it". Being able to answer
> that kinda depends on recording accesses one way or another, so the
> weirdnesses are bound to happen.
>

scandir is never going to answer that. Neither is a simple blind "access"
time stored in filesystem metadata.

ChrisA
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YW5NMIE2SC3RQWDMJX2DVIS3FRHNPEQM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Chris Angelico
On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing  wrote:
>
> On 27/10/20 8:24 am, Victor Stinner wrote:
> > I would
> > rather want to kill the whole concept of "access" time in operating
> > systems (or just configure the OS to not update it anymore). I guess
> > that it's really hard to make it efficient and accurate at the same
> > time...
>
> Also it's kind of weird that just looking at data on the
> disk can change something about it. Sometimes it's an
> advantage to *not* have quantum computing!
>

And yet, it's of incredible value to be able to ask "now, where was
that file... the one that I was looking at last week, called something
about calendars, and it had a cat picture in it". Being able to answer
that kinda depends on recording accesses one way or another, so the
weirdnesses are bound to happen.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Greg Ewing

On 27/10/20 8:24 am, Victor Stinner wrote:

I would
rather want to kill the whole concept of "access" time in operating
systems (or just configure the OS to not update it anymore). I guess
that it's really hard to make it efficient and accurate at the same
time...


Also it's kind of weird that just looking at data on the
disk can change something about it. Sometimes it's an
advantage to *not* have quantum computing!

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GPWWYOB3EQKDLELTYTE4IWGQ726BCPSY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Eryk Sun
On 10/26/20, Victor Stinner  wrote:
> Le lun. 19 oct. 2020 à 13:50, Steve Dower  a écrit
> :
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything.
>
> I agree that this surprising behavior can be documented. Attempting to
> provide accurate access time in os.scandir() is likely to slow-down
> the function which would defeat its whole purpose.

I don't think the access time (st_atime) is a significant concern. I'm
concerned with the reliability of the file size (st_size) and
last-write time (st_mtime) in stat() results. Developers are used to
various filesystem policies on various platforms that limit when the
access time gets updated, if at all. FAT32 filesystems only have an
access date, and the driver in Windows fixes the access time at
midnight. Updating the access time in NTFS and ReFS can be completely
disabled at the system level; otherwise it's updated with a
granularity of one hour if it's only the access time that would be
updated.

The biggest concern for me is NTFS hardlinks, for which the st_size
and st_mtime in the directory entry is unreliable. When a file with
multiple hardlinks is modified, the filesystem only updates the
duplicated information in the directory entry of the opened link.
Because the entry in the directory doesn't include the link count or
even a boolean value to indicate that a file has multiple hardlinks,
if you don't know whether or not there's a possibility of hardlinks,
then os.stat() is required in order to reliably determine st_size and
st_mtime, to the extent that reliably knowing st_mtime is possible.

A general problem that affects even os.stat() is that a modified file
may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel
file object of the particular open. Whether it's immediately noted in
the last-write time of the shared FCB (file control block) is up to
filesystem policy.

Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately
notes the modification time, so the st_mtime value from os.stat() is
current. In prior versions of NTFS, and with other Microsoft
filesystems such as FAT32, the last-write time is only noted when the
file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when
the open is closed.

This means that st_size may change without also changing st_mtime. I'm
using Windows 10 2004 currently, so I can't show an NTFS example, but
the following shows the behavior with FAT32:

f = open('spam.txt', 'w')
st1 = os.stat('spam.txt')
time.sleep(10)
f.write('spam')
f.flush()
st2 = os.stat('spam.txt')

The above write was noted only by setting the FO_FILE_MODIFIED flag on
the kernel file object. (The file object can be inspected with a local
kernel debugger.) The write time wasn't noted in the FCB, i.e.
st_mtime hasn't changed in st2:

>>> st2.st_size - st1.st_size
4
>>> st2.st_mtime - st1.st_mtime
0.0

The last-write time is noted when FlushFileBuffers (os.fsync) is
called on the open:

>>> os.fsync(f.fileno())
>>> st3 = os.stat('spam.txt')
>>> st3.st_mtime - st1.st_mtime
10.0

Note also that, with NTFS, to the extent that the FCB metadata is
current, calling os.stat() on a link updates the duplicated
information in the directory entry. So calling os.stat() on a NTFS
file may update the entry that's returned by a subsequent os.scandir()
call.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Victor Stinner
Le lun. 19 oct. 2020 à 13:50, Steve Dower  a écrit :
> Feel free to file a bug, but we'll likely only add a vague note to the
> docs about how Windows works here rather than changing anything.

I agree that this surprising behavior can be documented. Attempting to
provide accurate access time in os.scandir() is likely to slow-down
the function which would defeat its whole purpose.

--

By the way, who relies on the access time? I don't understand why the
creation and modification times are not enough for all usages. I would
rather want to kill the whole concept of "access" time in operating
systems (or just configure the OS to not update it anymore). I guess
that it's really hard to make it efficient and accurate at the same
time...

Linux has a "relatime" mount option (Fedora enables it by default):
"With this option enabled, atime data is written to the disk only if
the file has been modified since the atime data was last updated
(mtime), or if the file was last accessed more than a certain amount
of time ago (by default, one day)." Minor enhancement over always
updating atime.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VKL5VXI6R4BNN36RX2FJ5G4YEHS372UV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-23 Thread Random832
On Fri, Oct 23, 2020, at 02:14, Random832 wrote:
> What correction, exactly, do you mean? The post I saw with the word 
> "Correction" on it is the one that *makes* the claim people are taking 
> issue with.

okay, sorry, I see the other correction post now...

My issue I guess was the same as Eryk Sun, it wasn't clear which parts of the 
previous post you were correcting and which (if any) you stood by, since they 
were about the behavior of different parts of the system, so it didn't register 
as a correction to that part when I originally read it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U4MZFDDMM4L52DKA6NBB7MKRJJ7QWEOB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-23 Thread Random832
On Tue, Oct 20, 2020, at 07:42, Steve Dower wrote:
> On 20Oct2020 0520, Rob Cliffe wrote:
> > On 19/10/2020 12:42, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of 
> >>> mtime rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while 
> >> os.scandir() returns the last access time without updating it.
> >>
> >> Eryk replied with a deeper explanation of the cause, but fundamentally 
> >> this is what you are seeing.
> >>
> >> Feel free to file a bug, but we'll likely only add a vague note to the 
> >> docs about how Windows works here rather than changing anything. If 
> >> anything, we should probably fix os.stat() to avoid updating the 
> >> access time so that both functions behave the same, but that might be 
> >> too complicated.
> >>
> >> Cheers,
> >> Steve
> > Sorry - what you say does not match the behaviour I observe, which is that
> 
> Yes, I posted a correction already (immediately after sending the first 
> email).

ok, see, the correction you posted doesn't address the part of your claim that 
people are taking issue with, which is that *calling os.stat() causes the atime 
to be set to the time of the call to os.stat()*. This is not the same thing as 
[correctly] saying that "calling os.stat() may return a more up-to-date atime, 
the time of the last read, write, or other operation", and the phrasing 
"updates the access time to _now_" certainly *seemed* unambiguous.

And at this point it's not clear to me whether you understand that people are 
reading your claim this way.

What correction, exactly, do you mean? The post I saw with the word 
"Correction" on it is the one that *makes* the claim people are taking issue 
with.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O63FMQYOHASHZ33CWBYQMD3H3XYGT5QC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-20 Thread Eryk Sun
On 10/19/20, Greg Ewing  wrote:
> On 20/10/20 4:52 am, Gregory P. Smith wrote:
>> Those of us with a traditional posix filesystem background may raise
>> eyeballs at this duplication, seeing a directory as a place that merely
>> maps names to inodes
>
> This is probably a holdover from MS-DOS, where there was no separate
> inode-like structure -- it was all in the directory entry.

DOS implemented a find-first/find-next API (int 21h 4E/4F) that
provided a file's name, attributes, size, and last write time/date. I
think it's clear that the design was influenced by the
readily-available contents of a FAT dirent. The Win32 API extended
this to FindFirstFile/FindNextFile, with added support for the long
filename, create and access times, and, in NT 5+, the reparse tag for
a reparse point.

NTFS had to support this metadata in the directory index, else
FindFirstFile/FindNextFile would be too expensive if the filesystem
had to fetch the metadata from the MFT for every matching file in a
listing. It tries to keep the duplicated metadata in sync -- such as
when a file is open, closed, manually extended in size, when the cache
is flushed, or when metadata is explicitly set (e.g.
SetFileInformationByHandle: FileBasicInfo). But for performance it
doesn't update the duplicated data every time a file is read from or
written to. And, in particular, if it's just the access time that
changed, it updates the duplicated access time with a one-hour
granularity. (There's also a registry value, as I mentioned
previously, that disables updating access times completely -- in both
the MFT record and the directory index.)

That said, if a file has multiple hardlinks the current NTFS
implementation for updating duplicated data is totally unreliable. It
only updates the accessed link. All other links go stale. We don't
have any reasonable way to special case this situation because the
directory entry doesn't include the number of links a file has. It has
to be opened and queried directly, but then one might as well do a
full stat() for every file.

I recommend relying on only the high-level is_dir(), is_file(), and
is_symlink() methods of os.scandir() items, to quickly process a
directory. inode() is reliable -- as much as is possible in Windows --
because the implementation gets the full stat info, but check to
ensure it's not 0. It's based on the file ID, which Windows
filesystems aren't required to support (or reliably support; it's not
stable in FAT). NTFS and ReFS support reliable 64-bit file IDs, and
opening by file ID.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JKK47AWKUOWPPBEAIRGIFRMW6FCPZILG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-20 Thread Steve Dower

On 20Oct2020 0520, Rob Cliffe wrote:

On 19/10/2020 12:42, Steve Dower wrote:

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of 
mtime rather than the actual access time.


Correction - os.stat() updates the access time to _now_, while 
os.scandir() returns the last access time without updating it.


Eryk replied with a deeper explanation of the cause, but fundamentally 
this is what you are seeing.


Feel free to file a bug, but we'll likely only add a vague note to the 
docs about how Windows works here rather than changing anything. If 
anything, we should probably fix os.stat() to avoid updating the 
access time so that both functions behave the same, but that might be 
too complicated.


Cheers,
Steve

Sorry - what you say does not match the behaviour I observe, which is that


Yes, I posted a correction already (immediately after sending the first 
email).


What you are seeing is what Windows decided was the best approach. If 
you want to avoid that, os.stat() will get the latest available 
information. But I don't want to penalise people who don't need it by 
slowing down their scandir calls unnecessarily.


A documentation patch to make this difference between os.stat() and 
DirEntry even clearer would be fine.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NAR7LTW2XMBKAPKLVBQQFVK6EA4ZWQZP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Rob Cliffe via Python-Dev



On 19/10/2020 12:42, Steve Dower wrote:

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of 
mtime rather than the actual access time.


Correction - os.stat() updates the access time to _now_, while 
os.scandir() returns the last access time without updating it.


Eryk replied with a deeper explanation of the cause, but fundamentally 
this is what you are seeing.


Feel free to file a bug, but we'll likely only add a vague note to the 
docs about how Windows works here rather than changing anything. If 
anything, we should probably fix os.stat() to avoid updating the 
access time so that both functions behave the same, but that might be 
too complicated.


Cheers,
Steve

Sorry - what you say does not match the behaviour I observe, which is that
    (1) Neither os.stat, nor reading os.scandir directory entries, 
update any of the times on disk.
    (2) os.stat.st_atime returns the "correct" time the file was last 
accessed.

    (3) os.scandir always returns st.atime equal to st.mtime.

Modified demo program:

# osscandirtest.py
import time, os

print(f'[1] {time.time()=}')
with open('Test', 'w') as f: f.write('Anything\n')

time.sleep(20)

print(f'[2] {time.time()=}')
with open('Test', 'r') as f: f.readline() # Read the file

time.sleep(10)

print(f'[3] {time.time()=}')
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
    if DirEntry.name == 'Test':
    stat = DirEntry.stat()
    print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=} 
{stat.st_atime=}')

print(os.stat('Test'))
for DirEntry in os.scandir('.'):
    if DirEntry.name == 'Test':
    stat = DirEntry.stat()
    print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=} 
{stat.st_atime=}')

print(f'[4] {time.time()=}')

Sample output:

[1] time.time()=1603166161.12121
[2] time.time()=1603166181.1306772
[3] time.time()=1603166191.1426473
os.stat_result(st_mode=33206, st_ino=9851624184951253, 
st_dev=2230120362, st_nlink=1, st_uid=0, st_gid=0, st_size=10,

st_atime=1603166181, st_mtime=1603166161, st_ctime=1603166161)
scandir DirEntry stat.st_ctime=1603166161.12121 
stat.st_mtime=1603166161.12121 stat.st_atime=1603166161.12121
os.stat_result(st_mode=33206, st_ino=9851624184951253, 
st_dev=2230120362, st_nlink=1, st_uid=0, st_gid=0, st_size=10,

st_atime=1603166181, st_mtime=1603166161, st_ctime=1603166161)
scandir DirEntry stat.st_ctime=1603166161.12121 
stat.st_mtime=1603166161.12121 stat.st_atime=1603166161.12121

[4] time.time()=1603166191.1426473

You will observe that
    (1) The results from the two os.stat calls are the same, as are the 
results from the two scandir calls.
    (2) The os.stat.st_atime (1603166181) *IS* the time that the file 
was read with the

            with open('Test', 'r') as f: f.readline() # Read the file
        line of code, as it matches the
            [2] time.time()=1603166181.1306772
        line of output (apart from discarded fractions of a second) and 
is 20 seconds (*not* 30 seconds) after the file creation time, as expected.
    (3) The os.scandir atime is a copy of mtime (and in this case, of 
ctime as well).


So it really does seem that the only thing "wrong" is that os.scandir 
returns atime as a copy of mtime, rather than the correct value.
And since os.stat returns the "right" answer and os.scandir doesn't, it 
really seems that this is a bug, or at least a deficiency, in os.scandir.


Demo run on Windows 10 Home version 1903 OS build 18362.1139
Python version 3.8.3 (32-bit).
Best wishes
Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MGICSKCSTSKS36XUP6IZTXZOSGBPMQYY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Greg Ewing

On 20/10/20 4:52 am, Gregory P. Smith wrote:
Those of us with a traditional posix filesystem background may raise 
eyeballs at this duplication, seeing a directory as a place that merely 
maps names to inodes


This is probably a holdover from MS-DOS, where there was no separate
inode-like structure -- it was all in the directory entry.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QJVZ2EXFKCMZ4YHERFI2FXJTWWPFCFSA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
>
> Resolving the path is the most expensive part, even if the file is not
> opened (I've been working with the NTFS team on this area, and we've
> been benchmarking/analysing all of it).

If you say it's been extensively benchmarked and there's no direct way
around the speed bottleneck, then I take your word for it. To clarify
what I had in mind, I was hoping that because NTFS implements the fast
I/O function FastIoQueryOpen [1] (via  NtfsNetworkOpenCreate, as given
by its FastIoDispatch table) that IRP_MJ_CREATE would be bypassed and
that the filesystem would not incur a significant cost to parse the
remaining path. I figured that most of the work would be in the
ObObjectObjectByName and IopParseDevice executive calls that lead up
to querying the filesystem.

Anyway, it's unfortunate that the Windows API doesn't support NT
handle-relative names, except in the registry API. If we could call
NTAPI NtQueryAttributesFile [2] directly, then the ObjectAttributes
argument could be relative to a directory handle set in the
RootDirectory field. That would eliminate the vast majority of the
path-resolution cost. A handle-relative open or query goes straight to
the filesystem device, which goes straight to the directory that
contains the file.

To eliminate the cost of opening the directory handle, scandir() could
be rewritten to use CreateFileW and GetFileInformationByHandleEx:
FileIdBothDirectoryInfo [3] instead of FindFirstFileW / FindNextFileW.
Just cache the directory handle in place of caching the find handle.
scandir() would gain fd support in Windows. Opening a directory via
os.open requires the flag _O_OBTAIN_DIR (0x2000), defined in fcntl.h.

FileIdBothDirectoryInfo provides the file ID, so the implementation
would support the inode() method without calling stat(). It would
still directly support is_dir() and is_file() based on the file
attributes, and is_symlink() based on the file attributes and the
EaSize field. The Windows Protocols document that the latter contains
the reparse tag for a reparse point. The field is reused because a
reparse point can't have extended attributes.

All that said, I don't prefer to call NtQueryAttributesFile or any
other NTAPI function in Windows Python. I'd rather do the best
possible with just the Windows API. I wish there were a new
GetFileAttributesExExW function that supported handle-relative names.
Even better would be a new function that calls
NtQueryInformationByName -- something like GetFileInformationByName --
for FileStatInfo (and FileCaseSensitiveInfo as well, which is becoming
more of an issue), also with support for handle-relative names.

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_fast_io_dispatch
[2] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile
[3] 
https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GODUIB5WKVZLX4BVPEM2NS37JFHUXIID/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
> On 19Oct2020 1242, Steve Dower wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>>> rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while
>> os.scandir() returns the last access time without updating it.
>
> Let me correct myself first :)
>
> *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file entry
> metadata, while os.scandir() always looks at the directory entry metadata.
>
> My suggested approach still applies, other than the bit where we might
> fix os.stat(). The best we can do is regress os.scandir() to have
> similarly poor performance, but the best *you* can do is use os.stat()
> for accurate timings when files might be being modified while your
> program is running, and don't do it when you just need names/kinds (and
> I'm okay adding that note to the docs).

If this is the correction to which you're referring in the previous
message, I assumed you stood by the claim that os.stat() may update
st_atime. That shouldn't be the case, so there shouldn't be anything
that needs to be fixed there, unless I'm missing what you think needs
to be fixed. If it's actually a problem, then I'd really, really like
a test case that reproduces it. If it was just a misinterpreted test
case or mis-remembered fact, then that's good news for me. ;-)

Regarding updating the access time in the directory entry, in my
previous reply I explained that NTFS should update it with a one-hour
granularity. With FAT, it's daily.

Regarding the view that this is only about "accurate timings when
files might be being modified while your program is running", in my
previous messages I stressed that the directory entry for a hard link
may have the wrong size, change time, write time, and access time if
it wasn't the last link used to update the file. That has nothing to
do with the file being modified while the program is running. It's a
stale directory entry. If you call os.stat() on the stale link, NTFS
will update it with the correct metadata.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SUGIZ6OAXOD37USVBWAW7JRSUDBSMG7Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Steve Dower

On 19Oct2020 1846, Eryk Sun wrote:

On 10/19/20, Steve Dower  wrote:

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:

TLDR: In os.scandir directory entries, atime is always a copy of mtime
rather than the actual access time.


Correction - os.stat() updates the access time to _now_, while
os.scandir() returns the last access time without updating it.


os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.


I thought I got my self-correction fired off quickly enough to save you 
from writing this :)



For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:



[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041


Thanks for the detailed reference.


Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.


Resolving the path is the most expensive part, even if the file is not 
opened (I've been working with the NTFS team on this area, and we've 
been benchmarking/analysing all of it). There are a few improvements 
coming across the board, but I'd much rather just emphasise that 
os.scandir() is as fast as we can manage using cached information 
(including as cached by the OS). Otherwise we prevent people from using 
the fastest available option when they can, if they don't need the 
additional information/accuracy.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MMRMLWGEV2ZGIACXQTSEQC6TPWGL3UZ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.

Note that updating the access time in NTFS can be disabled by the
"NtfsDisableLastAccessUpdate" value in
"HKLM\System\CurrentControlSet\Control\FileSystem". The default value
in Windows 10 should be 0x8002, which means the value is system
managed and updating the access time is enabled.

If it's only the access time that changes, the directory entry may be
updated with a significant granularity such as hourly or daily. For
NTFS, it's hourly. To confirm this, wait an hour from the current
access time in the directory entry; open the file; read some data; and
close the file. The access time in the directory entry should be
updated.

For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:

* 2.1.4.17 Algorithm for Noting That a File Has Been Modified
* 2.1.4.19 Algorithm for Noting That a File Has Been Accessed
* 2.1.4.18 Algorithm for Updating Duplicated Information

Also check the tables in Appendix A, which provide the update
granularity of file time stamps (presumably for directory entries) for
common Windows filesystems.

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.

Ultimately, I'm waiting for Windows 10 to provide a WinAPI function
that calls the relatively new NTAPI function NtQueryInformationByName
[2] (by name, not by handle!) to get the FileStatInformation, as well
as for this information to be made available by handle via
GetFileInformationByHandleEx. Compared to GetFileAttributesExW, the
FileStatInformation additionally provides the file ID (if implemented
by the filesystem), change time, reparse tag, number of links, and the
effective access of the security context of the caller (i.e. process
or thread access token). The latter is something that we've never
impemented with os.stat(). It's not the same as POSIX
owner-group-other permissions. It would need a new attribute such as
st_effective_access. It could be used to provide a real implementation
of os.access() in Windows.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntqueryinformationbyname
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NPP6GKAEI7SOVA45WTJ222YVEALTF6WO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Steve Dower

On 19Oct2020 1652, Gregory P. Smith wrote:
I'm sure this is covered in MSDN.  Linking to that if it has it in a 
concise explanation would make sense from a note in our docs.


Probably unlikely :) I'm pretty sure this started "perfect" and was then 
wound back to improve performance. But it's almost certainly an option 
somewhere, which means you can't rely on it being either true nor false. 
You just have to be explicit for certain pieces of information.


If I'm understanding Steve correctly this is due to Windows/NTFS storing 
the access time potentially redundantly in two different places. One 
within the directory entry itself and one with the file's own metadata.  
Those of us with a traditional posix filesystem background may raise 
eyeballs at this duplication, seeing a directory as a place that merely 
maps names to inodes with the inode structure (equiv: file entry 
metadata) being the sole source of truth.  Which ones get updated when 
and by what actions is up to the OS.


So yes, just document the "quirk" as an intended OS behavior.  This is 
one reason scandir() can return additional information on windows vs 
what it can return on posix.  The entire point of scandir() is to return 
as much as possible from the directory without triggering reads of the 
inodes/file-entry-metadata. :)


Yeah, I'd document it as a quirk of scandir. There's also a race where 
if you scandir(), then someone touches the file, then you look at the 
cached stat you get the wrong information too (an any platform). Making 
clearer that it's for non-time sensitive queries is most accurate, 
though we could also give an example of "access times may not be up to 
date depending on OS-level caching" without committing us to being 
responsible for OS decisions.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EBWUDEQEPRWJN36FLUUJQWP5EWLPWRPD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Mats Wichmann
On 10/19/20 9:52 AM, Gregory P. Smith wrote:
> 
> 
> On Mon, Oct 19, 2020 at 6:28 AM Ivan Pozdeev via Python-Dev
> mailto:python-dev@python.org>> wrote:
> 
> 
> On 19.10.2020 14:47, Steve Dower wrote:
> > On 19Oct2020 1242, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of
> mtime rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.
> >
> > Let me correct myself first :)
> >
> > *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file
> entry
> > metadata, while os.scandir() always looks at the directory entry
> metadata.
> 
> Is this behavior documented somewhere?
> 
> Such weirdness certaintly something that needs to be documented but
> I really don't like describing such quirks that are out of our control
> and may be subject to change in Python documentation. So we should
> only consider doing so if there are no other options.
> 
> 
> I'm sure this is covered in MSDN.  Linking to that if it has it in a
> concise explanation would make sense from a note in our docs.
> 
> If I'm understanding Steve correctly this is due to Windows/NTFS storing
> the access time potentially redundantly in two different places. One
> within the directory entry itself and one with the file's own metadata. 
> Those of us with a traditional posix filesystem background may raise
> eyeballs at this duplication, seeing a directory as a place that merely
> maps names to inodes with the inode structure (equiv: file entry
> metadata) being the sole source of truth.  Which ones get updated when
> and by what actions is up to the OS.
> 
> So yes, just document the "quirk" as an intended OS behavior.  This is
> one reason scandir() can return additional information on windows vs
> what it can return on posix.  The entire point of scandir() is to return
> as much as possible from the directory without triggering reads of the
> inodes/file-entry-metadata. :)
> 
> -gps

depending on atimes isn't a consistently reliable mechanism anyway,
since filesystems on Linux et. al. are allowed to be mounted so as to
not independently update access times.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QXNHYK6NDECISIOZVO4BCW2O6UXRZJGO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Gregory P. Smith
On Mon, Oct 19, 2020 at 6:28 AM Ivan Pozdeev via Python-Dev <
python-dev@python.org> wrote:

>
> On 19.10.2020 14:47, Steve Dower wrote:
> > On 19Oct2020 1242, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.
> >
> > Let me correct myself first :)
> >
> > *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file entry
> > metadata, while os.scandir() always looks at the directory entry
> metadata.
>
> Is this behavior documented somewhere?
>
> Such weirdness certaintly something that needs to be documented but I
> really don't like describing such quirks that are out of our control
> and may be subject to change in Python documentation. So we should only
> consider doing so if there are no other options.
>

I'm sure this is covered in MSDN.  Linking to that if it has it in a
concise explanation would make sense from a note in our docs.

If I'm understanding Steve correctly this is due to Windows/NTFS storing
the access time potentially redundantly in two different places. One within
the directory entry itself and one with the file's own metadata.  Those of
us with a traditional posix filesystem background may raise eyeballs at
this duplication, seeing a directory as a place that merely maps names to
inodes with the inode structure (equiv: file entry metadata) being the sole
source of truth.  Which ones get updated when and by what actions is up to
the OS.

So yes, just document the "quirk" as an intended OS behavior.  This is one
reason scandir() can return additional information on windows vs what it
can return on posix.  The entire point of scandir() is to return as much as
possible from the directory without triggering reads of the
inodes/file-entry-metadata. :)

-gps


>
> >
> > My suggested approach still applies, other than the bit where we might
> fix os.stat(). The best we can do is regress os.scandir() to have
> > similarly poor performance, but the best *you* can do is use os.stat()
> for accurate timings when files might be being modified while your
> > program is running, and don't do it when you just need names/kinds (and
> I'm okay adding that note to the docs).
> >
> > Cheers,
> > Steve
> >
> > [1]: With some fallback to directory entries in exceptional cases that
> don't apply here.
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> > --
> > Regards,
> > Ivan
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VFXDBURSZ4QKA6EQBZLU6K4FKMGZPSF5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IZ6KSRTJLORCB33OMVUPFYQYLMBM26EJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Random832
On Mon, Oct 19, 2020, at 07:42, Steve Dower wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> > TLDR: In os.scandir directory entries, atime is always a copy of mtime 
> > rather than the actual access time.
> 
> Correction - os.stat() updates the access time to _now_, while 
> os.scandir() returns the last access time without updating it.

This is surprising - do we know why this happens?

Also, it doesn't seem true on my system with python 3.8.5 [and, yes, I checked 
that last access update is enabled for my test and updates normally when 
reading the file's contents].
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GX3KD4UQKJONCLOZY743WXNGENXL7YG2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Ivan Pozdeev via Python-Dev



On 19.10.2020 14:47, Steve Dower wrote:

On 19Oct2020 1242, Steve Dower wrote:

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:

TLDR: In os.scandir directory entries, atime is always a copy of mtime rather 
than the actual access time.


Correction - os.stat() updates the access time to _now_, while os.scandir() 
returns the last access time without updating it.


Let me correct myself first :)

*Windows* has decided not to update file access time metadata *in directory entries* on reads. os.stat() always[1] looks at the file entry 
metadata, while os.scandir() always looks at the directory entry metadata.


Is this behavior documented somewhere?

Such weirdness certaintly something that needs to be documented but I really don't like describing such quirks that are out of our control 
and may be subject to change in Python documentation. So we should only consider doing so if there are no other options.





My suggested approach still applies, other than the bit where we might fix os.stat(). The best we can do is regress os.scandir() to have 
similarly poor performance, but the best *you* can do is use os.stat() for accurate timings when files might be being modified while your 
program is running, and don't do it when you just need names/kinds (and I'm okay adding that note to the docs).


Cheers,
Steve

[1]: With some fallback to directory entries in exceptional cases that don't 
apply here.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
Code of Conduct: http://python.org/psf/codeofconduct/
--
Regards,
Ivan

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VFXDBURSZ4QKA6EQBZLU6K4FKMGZPSF5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Steve Dower

On 19Oct2020 1242, Steve Dower wrote:

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of mtime 
rather than the actual access time.


Correction - os.stat() updates the access time to _now_, while 
os.scandir() returns the last access time without updating it.


Let me correct myself first :)

*Windows* has decided not to update file access time metadata *in 
directory entries* on reads. os.stat() always[1] looks at the file entry 
metadata, while os.scandir() always looks at the directory entry metadata.


My suggested approach still applies, other than the bit where we might 
fix os.stat(). The best we can do is regress os.scandir() to have 
similarly poor performance, but the best *you* can do is use os.stat() 
for accurate timings when files might be being modified while your 
program is running, and don't do it when you just need names/kinds (and 
I'm okay adding that note to the docs).


Cheers,
Steve

[1]: With some fallback to directory entries in exceptional cases that 
don't apply here.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Steve Dower

On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of mtime 
rather than the actual access time.


Correction - os.stat() updates the access time to _now_, while 
os.scandir() returns the last access time without updating it.


Eryk replied with a deeper explanation of the cause, but fundamentally 
this is what you are seeing.


Feel free to file a bug, but we'll likely only add a vague note to the 
docs about how Windows works here rather than changing anything. If 
anything, we should probably fix os.stat() to avoid updating the access 
time so that both functions behave the same, but that might be too 
complicated.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NGMVB7GWDBCPYHL4IND2LBZ2QPXLWRAX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-18 Thread Eric V. Smith

On 10/18/2020 12:25 PM, Rob Cliffe via Python-Dev wrote:
How do I do that, please?  I can't see an obvious create option on 
that web page.  Do I need to log in?


Yes, you need to log in before you can open an issue. You might need to 
create an account first if you don't have one: it's called "Register" on 
bpo. After you've logged in, there's a Create New button.


Eric



Thanks
Rob Cliffe

On 18/10/2020 05:31, Gregory P. Smith wrote:
Could you please file this as an issue on bugs.python.org 
?


Thanks!
-Greg


On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev 
mailto:python-dev@python.org>> wrote:



TLDR: In os.scandir directory entries, atime is always a copy of
mtime
rather than the actual access time.

Demo program: Windows 10, Python 3.8.3:

# osscandirtest.py
import time, os
with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
time.sleep(10)
with open('Test', 'r') as f: f.readline() # Read the file
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
 if DirEntry.name == 'Test':
 stat = DirEntry.stat()
 print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
{stat.st_atime=}')

Sample output:

os.stat_result(st_mode=33206, st_ino=8162774324687317,
st_dev=2230120362, st_nlink=1, st_uid=0,
st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
st_ctime=1600631262)
scandir DirEntry stat.st_ctime=1600631262.951019
stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848

For os.stat, atime is 10 seconds more than mtime, as would be
expected.
But for os.scandir, atime is a copy of mtime.
ISTM that this is a bug, and in fact recently it stopped me from
using
os.scandir in a program where I needed the access timestamp. No big
deal, but ...
If it is a feature for some reason, presumably it should be
documented.

Best wishes
Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org

To unsubscribe send an email to python-dev-le...@python.org

https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
Code of Conduct: http://python.org/psf/codeofconduct/




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/377JYZMK3MITKPCCGWQ43R5FPZPO2ADA/
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YEELT5B3UWQOV2WPMJ4OTFWCMIQMO63X/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-18 Thread Rob Cliffe via Python-Dev
How do I do that, please?  I can't see an obvious create option on that 
web page.  Do I need to log in?

Thanks
Rob Cliffe

On 18/10/2020 05:31, Gregory P. Smith wrote:
Could you please file this as an issue on bugs.python.org 
?


Thanks!
-Greg


On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev 
mailto:python-dev@python.org>> wrote:



TLDR: In os.scandir directory entries, atime is always a copy of
mtime
rather than the actual access time.

Demo program: Windows 10, Python 3.8.3:

# osscandirtest.py
import time, os
with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
time.sleep(10)
with open('Test', 'r') as f: f.readline() # Read the file
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
 if DirEntry.name == 'Test':
 stat = DirEntry.stat()
 print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
{stat.st_atime=}')

Sample output:

os.stat_result(st_mode=33206, st_ino=8162774324687317,
st_dev=2230120362, st_nlink=1, st_uid=0,
st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
st_ctime=1600631262)
scandir DirEntry stat.st_ctime=1600631262.951019
stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848

For os.stat, atime is 10 seconds more than mtime, as would be
expected.
But for os.scandir, atime is a copy of mtime.
ISTM that this is a bug, and in fact recently it stopped me from
using
os.scandir in a program where I needed the access timestamp. No big
deal, but ...
If it is a feature for some reason, presumably it should be
documented.

Best wishes
Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org

To unsubscribe send an email to python-dev-le...@python.org

https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
Code of Conduct: http://python.org/psf/codeofconduct/



___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/377JYZMK3MITKPCCGWQ43R5FPZPO2ADA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-18 Thread Eryk Sun
On 10/15/20, Rob Cliffe via Python-Dev  wrote:
>
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.

There are inconsistencies in various scenarios between between the
stat info from the directory entry and the stat info from the File
Control Block (FCB) -- the filesystem's in-memory record that's common
to all opens for a file/directory.

The worst case is for an NTFS file with multiple hardlinks, for which
the directory entry information is from the last time the file was
opened using a particular hardlink. The accurate NTFS file information
is in the file's Master File Table (MFT) record, which gets accessed
to populate the FCB and update the particular link when a file is
opened.

If you're looking for file times and file size, the only reliable
information comes from directly opening the file an querying the info
via GetFileInformationByHandle (called by os.stat),
GetFileInformationByHandleEx (FileBasicInfo, FileStandardInfo),
GetFileTime, and GetFileSizeEx.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IJIFZHPEEMVPD2LN6H3MY4KGRKNQ4TBQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-18 Thread Tal Einat
Interesting! Indeed, please create an issue and post a link here.

>From a quick look at the code, I can't see any obvious bugs here, the info
seems to be coming directly from FindNextFileW. This will likely require
some more digging.


On Sun, Oct 18, 2020 at 7:37 AM Gregory P. Smith  wrote:

> Could you please file this as an issue on bugs.python.org?
>
> Thanks!
> -Greg
>
>
> On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev <
> python-dev@python.org> wrote:
>
>>
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>>
>> Demo program: Windows 10, Python 3.8.3:
>>
>> # osscandirtest.py
>> import time, os
>> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
>> time.sleep(10)
>> with open('Test', 'r') as f: f.readline() # Read the file
>> print(os.stat('Test'))
>> for DirEntry in os.scandir('.'):
>>  if DirEntry.name == 'Test':
>>  stat = DirEntry.stat()
>>  print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
>> {stat.st_atime=}')
>>
>> Sample output:
>>
>> os.stat_result(st_mode=33206, st_ino=8162774324687317,
>> st_dev=2230120362, st_nlink=1, st_uid=0,
>> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
>> st_ctime=1600631262)
>> scandir DirEntry stat.st_ctime=1600631262.951019
>> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>>
>> For os.stat, atime is 10 seconds more than mtime, as would be expected.
>> But for os.scandir, atime is a copy of mtime.
>> ISTM that this is a bug, and in fact recently it stopped me from using
>> os.scandir in a program where I needed the access timestamp. No big
>> deal, but ...
>> If it is a feature for some reason, presumably it should be documented.
>>
>> Best wishes
>> Rob Cliffe
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/INJBNXRKOBYFGFJ7CLHNJKVQQKU6X6NM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WAVJFASQWS7RDMZEHI4AHQMJ74COQO7O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-17 Thread Gregory P. Smith
Could you please file this as an issue on bugs.python.org?

Thanks!
-Greg


On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev <
python-dev@python.org> wrote:

>
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.
>
> Demo program: Windows 10, Python 3.8.3:
>
> # osscandirtest.py
> import time, os
> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
> time.sleep(10)
> with open('Test', 'r') as f: f.readline() # Read the file
> print(os.stat('Test'))
> for DirEntry in os.scandir('.'):
>  if DirEntry.name == 'Test':
>  stat = DirEntry.stat()
>  print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
> {stat.st_atime=}')
>
> Sample output:
>
> os.stat_result(st_mode=33206, st_ino=8162774324687317,
> st_dev=2230120362, st_nlink=1, st_uid=0,
> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
> st_ctime=1600631262)
> scandir DirEntry stat.st_ctime=1600631262.951019
> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>
> For os.stat, atime is 10 seconds more than mtime, as would be expected.
> But for os.scandir, atime is a copy of mtime.
> ISTM that this is a bug, and in fact recently it stopped me from using
> os.scandir in a program where I needed the access timestamp. No big
> deal, but ...
> If it is a feature for some reason, presumably it should be documented.
>
> Best wishes
> Rob Cliffe
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/INJBNXRKOBYFGFJ7CLHNJKVQQKU6X6NM/
Code of Conduct: http://python.org/psf/codeofconduct/