[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-07-01 Thread Eryk Sun


Eryk Sun  added the comment:

> As far as I know os.stat() resets d.stat() maybe should be added 
> some option to d.stat() to force update(). d.stat(nt_force_update=True).

It depends on the filesystem. NTFS will update the directory entry as soon as 
the link is accessed by CreateFileW. But that's relatively expensive, and 
actually one of the more expensive steps in an os.stat call.

> I am not sure if os.path.getmtime() can reset d.stat().

genericpath.getmtime calls os.stat:

https://github.com/python/cpython/blob/d0981e61a5869c48e0a70a512967558391272a93/Lib/genericpath.py#L53

lexists, exists, getctime, getatime, getmtime, getsize, isdir, and isfile could 
be modified to call WinAPI GetFileAttributesExW [1], which is implemented via 
NtQueryFullAttributesFile [2], an optimized system call to get a file's 
network-open information. This can be significantly faster than the sequence of 
system calls that are required by os.stat. Note that this does not update the 
NTFS directory entry for the accessed link, unlike CreateFileW, but it does 
return updated information.

The GetFileAttributesExW result would be used if the call succeeds and the file 
isn't a reparse point. Otherwise fall back on os.stat (win32_xstat_impl). If 
passed an fd, try GetFileInformationByHandleEx to get the FileBasicInfo and 
FileStandardInfo, or use a single system call via NTAPI NtQueryInformationFile: 
FileNetworkOpenInformation, which is the same info that GetFileAttributesExW 
returns.

This could be implemented in C as nt._basic_stat(filename, 
follow_symlinks=True), where follow_symlinks means the expanded set of Windows 
name-surrogate reparse points. The C implementation would fall back on 
win32_xstat_impl. Note that a basic stat would not guarantee to return the 
following fields: st_ino, st_dev, and st_nlink. 

Alternatively, it could be implemented as a keyword-only basic=True option for 
os.stat, which would be ignored by POSIX. This way the high-level functions 
could continue to have a common implementation in genericpath.py.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileattributesexw
[2] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-07-01 Thread Cezary Wagner


Cezary Wagner  added the comment:

As far as I know os.stat() resets d.stat() maybe should be added some option to 
d.stat() to force update(). d.stat(nt_force_update=True).

I am not sure if os.path.getmtime() can reset d.stat().

os.stat() is 2x times slower than os.path.getmtime() and os.path.getmtime is 
16x slower than d.stat(). MAJOR PROBLEM is PERFORMANCE of os.stat() since for 
directories with 1000 files it takes big number of seconds to read all stats - 
something wrong is here I think since Windows Explorer is doing it very fast.

So I can not use os.stat() ONLY and it complicates code since I need to use 
os.stat() after d.stat() if files is OLDER THAN because if I use os.stat() the 
most program time will be these calls.

Do you know which code makes such reset of d.stat()?

If there is not possible optimization of there is need DOCUMENTATION update 
because it is really hard to understand why it is not working under windows 
some REMARKS can help me and others.

I have still believe that some optimization is possible for Windows.

Maybe it can be force to read stat by os.scandir(force_scan_stat=True) so all 
directory entries will be have cached stats before d.stat() is called. It can 
be faster I think since less calls from Python and probably better Windows API 
for it and same for Linux.

I will study C code later if it is possible or write some snippet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-26 Thread Steve Dower


Steve Dower  added the comment:

Those are all good ideas, but using os.stat(d) instead of d.stat() is shorter, 
more reliable, more compatible, and already works.

There's no middle ground where DirEntry can be faster, because it's already 
using that middle ground. All the discussion between Eryk and myself was 
figuring out whether we can use the DirEntry/FindFileData information to tell 
whether the file needs an explicit stat() or not, and we can't.

Most of the performance impact of stat() is just in opening the file (which 
scandir() does not do). As soon as we have to directly access the file, we may 
as well get all the information from it. We're already getting all the "cheap" 
information we can.

--
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python
versions: +Python 3.10, Python 3.9 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-26 Thread Cezary Wagner


Cezary Wagner  added the comment:

I think we can assume that NTFS is priority since that is the most used option.

I can not discuss what with FAT32 or FAT since I am not the best in this domain 
(in NTFS I am not the best too now). Whatever I think that system must do 
allocation for open files to avoid conflicts so it can be tracked but how?

Possible solutions is some extra function, argument for Windows - which makes 
cache dirty between calls.

It is very dirty proposal - I need to think if it is good. Even used names is 
ugly I need think more about it. My imagination tells me that it can be good 
direction.

dir_entry.stat(nt_force_cache_refresh=True) - it can be good for specific 
entries.
os.scandir(nt_force_cache_refresh=True) - it is sometimes not need for all 
entries

I am thinking:
dir_entry.stat(nt_force_cache_refresh=True) should be faster than 
os.stat(dir_entry.path) instead dir_entry.stat() which not works fo open files.
os.scandir(nt_force_cache_refresh=True) should be faster than 
dir_entry.stat(nt_force_cache_refresh=True) and dir_entry.stat() will work for 
open files. It is simpler to understand that Windows is different if such extra 
attribute must be added at all.

nt_force_cache_refresh can add to dir_entry some information that .stat() 
should not use cache.

Then best will be to not use nt_force_cache_refresh for open files - maybe you 
will find the way to detect open files in external application. I think Windows 
API allow to check if file is open - as far as remember sysinternals tools can 
do this so there some API for it I think.

See this tool: https://docs.microsoft.com/en-us/sysinternals/downloads/handle - 
maybe there is source code for it or you can learn for it.

Maybe you can check if file is open with use this API before dir_entry.stat()

I do want to force any solution but just share some rough ideas.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-26 Thread Steve Dower


Steve Dower  added the comment:

> We're faced with the choice between either always calling the real lstat, or 
> just documenting that files with hard links will have stale information if 
> the file was updated using another link.

That's an easy choice: we document it.

The os module comes with the assumption that platform-specific behaviour may 
vary, so this is really just a helpful note about a known variation on Windows. 
It is not a warning, just a note.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Eryk Sun


Eryk Sun  added the comment:

> What it also means is that the "file still in use by another app"
> scenario will probably have to manually use os.stat(). We can't 
> detect it, and it's the same race condition as calling os.stat() 
> shortly before the update flushes anyway.

FAT filesystems require an fsync (FlushFileBuffers) or close on the in-use file 
in order to update the last-write time in both the directory entry and the file 
control block (i.e. FCB, which is shared by all opens). It seems the developers 
take the meaning of "last write" literally in terms of the last time that 
cached data was flushed to disk. Because the last-write time in the FCB is 
updated separately from the file size in the FCB, even an [l]stat on an in-use 
FAT file may see st_size change while st_mtime remains constant, as I showed in 
the previous post. No matter whether we query the directory or the FCB, the 
reported last-write time of a FAT file might be wrong from the standpoint of 
reasonable expectations.

An fsync call is also useful with NTFS, but it only updates the directory entry 
of the opened link. It doesn't update other links to the file. On the other 
hand, with an NTFS file, calling os.[l]stat or os.fstat is sufficient to get 
updated stat information, regardless of the link that's accessed.

> What this probably means is if we can detect a link from the FFD struct
> (which I think we can?) then we can cache the attributes we trust and
> send .stat() through the real call.

It would nice if we could detect the link count without an additional system 
call. But it's not in the duplicated information in the directory entry and 
wouldn't be reliable if it were. The link count is available via 
GetFileInformationByHandleEx: FileStandardInfo, but if you're calling 
CreateFileW to open the file, you may as well get the full stat result while 
you're at it.

We're faced with the choice between either always calling the real lstat, or 
just documenting that files with hard links will have stale information if the 
file was updated using another link.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Steve Dower


Steve Dower  added the comment:

Okay, so it sounds like there's a class of files where we can't rely on the 
FindFileData having the right values. But we get enough information to be able 
just suppress the caching behaviour for those, right?

Basically, my criteria for fixing this in the runtime is that we should not add 
any new system calls during iteration, and cannot switch to always bypassing 
the cache for DirEntry.stat().

What this probably means is if we can detect a link from the FFD struct (which 
I think we can?) then we can cache the attributes we trust and send .stat() 
through the real call.

What it also means is that the "file still in use by another app" scenario will 
probably have to manually use os.stat(). We can't detect it, and it's the same 
race condition as calling os.stat() shortly before the update flushes anyway.

I won't accept having to make a second set of system calls on every file just 
in case one of them is being modified by another application. That's not the 
normal case, and the point of scandir is to improve performance in the normal 
enumeration cases.

Updating the documentation to mention/emphasise that some DirEntry.stat() 
fields may not update immediately, and so using os.stat() for current data is 
required, may be helpful. Though I think that's already implied by the line 
that says "Call os.stat() to fetch up-to-date information."

So if someone wants to improve the docs, or has a way to recognise links (with 
unreliable data in the directory listing) and not pre-fill the stat object, 
feel free to submit a PR. Otherwise, unfortunately, we're pretty much bound by 
Windows's own optimisations here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Eryk Sun


Eryk Sun  added the comment:

> Does it make the most sense for us to make .flush() also do an 
> implicit .fsync() (when it's actually a file object)?

Standard I/O in the Windows C runtime supports a "c" commit mode that causes 
fflush to call _commit() on the underlying fd [1]. Perhaps Python should 
support a similar "c" or "s" mode that makes a flush implicitly call fsync / 
_commit. 

But you may not be in control of flushing the file if it's being written to by 
a third-party library or application. Calling os.[l]stat works around the 
problem, but only with NTFS. It doesn't help with FAT32 / exFAT.

FAT filesystems update the last-write time when the file object is flushed or 
closed. It depends on the FO_FILE_MODIFIED flag in the file object or the 
CCB_FLAG_USER_SET_LAST_WRITE (from SetFileTime) in the file object's context 
control block (CCB). But opening, and even flushing, a file doesn't synchronize 
the context of other opens. Thus one can call os.stat (not even a scandir 
problem) repeatedly on a file and observe st_size changing while st_mtime 
remains constant:

>>> filepath = 'C:/Mount/TestFat32/test/spam.txt'
>>> f = open(filepath, 'w')
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(0, 1593116028.0)

>>> print('spam', file=f, flush=True)
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(6, 1593116028.0)

The last-write time gets updated by closing or flushing the kernel file object 
that was used to write to the file. 

>>> os.fsync(f.fileno())
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(6, 1593116044.0)

Another problem is stale entries for NTFS hard links, which can lead to getting 
a completely incorrect stat result via os.scandir -- wrong timestamps, wrong 
file size, and wrong file attributes.

An NTFS file's MFT record contains its timestamps, size, and attributes in a 
$STANDARD_INFORMATION attribute. This reliable information is what os.[l]stat 
and os.fstat query. But it gets duplicated in per-link $FILE_NAME attributes 
that directories index. The duplicated info for a link gets synchronized to the 
standard info when the link is accessed, but other links to the file do not get 
updated, and their values may be completely wrong. For example (using the scan 
function from my previous post):

>>> filepath1 = 'C:/Mount/TestNtfs/test/spam1.txt'
>>> filepath2 = 'C:/Mount/TestNtfs/test/spam2.txt'
>>> f = open(filepath1, 'w')
>>> os.link(filepath1, filepath2)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)

>>> print('spam', file=f, flush=True)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)

>>> os.fsync(f.fileno())
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)

>>> f.close()
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)

As shown, flushing or closing the file object for the "spam1.txt" link is not 
reflected in the entry for the "spam2.txt" link. The directory entry for the 
link is only updated when the link is accessed:

>>> f = open(filepath2)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(6, 1593116062.2080283)

---

[1] Linking commode.obj should enable commit-mode by default. But it's broken 
because __acrt_stdio_parse_mode is buggy. It initializes _stdio_mode to the 
global _commode value, but then it clobbers it when setting the required "r", 
"w", or "a" open mode.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Cezary Wagner


Cezary Wagner  added the comment:

I read some comments os.flush() or os.fsync() can be unrelated to problem. 
External application can be written in C# or whatever you want.

Under Windows (not Linux) - modification dates will be stalled in such sequence.
os.scandir()
dir_entry.stat() # let it be dir_entry.path == 'test.txt'
dir_entry.stat().st_mtime # will be for example 1
os.scandir()
dir_entry.stat() # let it be dir_entry.path == 'test.txt'
dir_entry.stat().st_mtime # will be STALLED for example 1


Under Windows (not Linux) - modification dates will be refreshed in such 
sequence.
os.scandir()
dir_entry.stat() # let it be dir_entry.path == 'test.txt'
dir_entry.stat().st_mtime # will be for example 1
os.stat('test.txt') # this code do something and it is not stalled in next call
os.scandir()
dir_entry.stat() # let it be dir_entry.path == 'test.txt'
dir_entry.stat().st_mtime # will be CHANGED for example 2

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Cezary Wagner


Cezary Wagner  added the comment:

Use case - detection of changes in open files is very important - log scanning 
- synchronization ...

I think that first of all it is need good unit test to detect this problem 
(rare edge case - probably it is missed because hard to imagine that it can not 
work when file is open - I will miss this I think).

It should work like this.

First program is writing file under Windows and second program (unit test) is 
running os.scandir() if repeated os.scandir() detect changes it is O.K. (same 
like in Linux).

To make it simpler it can be unit test in one program.

1. Open test file in test directory.
2. os.scandir() in test directory.
3. Some writes to test file (f.write() with and without flush, ... - to be 
defined what is sufficient to test).
4. os.scandir() in test directory - if change detected it O.K.
5. f.close()

I do not know Windows API now but I think we can detect id directory is changed 
between scans or we can detect if file is open (it is rare situation - rare 
edge case) in 90% all files will be closed.

So if all files is closed current os.scandir() maybe is good (not I do not 
understand implementation to evaluate it correclty) and when one of file or 
more there is need another implementation which will detect modification.

If you think I missed something please comment. You are welcome.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-25 Thread Cezary Wagner


Cezary Wagner  added the comment:

I do some test on linux all works - changes are detected and os.scandir() works 
but in Windows not - probably there is not unit test which check if 
os.scandir() is working on open files for writing.

f.flush() no matter since file can be changed in external Python/Java/C#/C++, 
... application - anyone can write logs in Windows. I will explain it in next 
comment. I just write this code to show only problem.

Result from linux STAT = False so only repeat os.scandir() calls. Modification 
are detected correctly in Linux but not in Windows.

[wagnecaz@nsdptrms01 ~]$ python3 s03_dir_entry.py
dir_entry.stat() /home/wagnecaz/test.txt 1593085189.1000397 since last change 
0.00111671508789
2020-06-25 13:39:49.101368 1593085189.101368
dir_entry.stat() /home/wagnecaz/test.txt 1593085189.1000397 since last change 
1.0028572082519531
2020-06-25 13:39:50.103111 1593085190.103111
dir_entry.stat() /home/wagnecaz/test.txt 1593085190.1020408 since last change 
1.0026073455810547
2020-06-25 13:39:51.104881 1593085191.104881
dir_entry.stat() /home/wagnecaz/test.txt 1593085191.104042 since last change 
1.0023958683013916
2020-06-25 13:39:52.106793 1593085192.106793
dir_entry.stat() /home/wagnecaz/test.txt 1593085192.106043 since last change 
1.0023260116577148
2020-06-25 13:39:53.108582 1593085193.108582
dir_entry.stat() /home/wagnecaz/test.txt 1593085193.1080444 since last change 
1.0021436214447021
2020-06-25 13:39:54.110500 1593085194.1105
dir_entry.stat() /home/wagnecaz/test.txt 1593085194.1100454 since last change 
1.0013866424560547
2020-06-25 13:39:55.111684 1593085195.111684
dir_entry.stat() /home/wagnecaz/test.txt 1593085195.1110466 since last change 
1.0022354125976562
2020-06-25 13:39:56.113542 1593085196.113542
dir_entry.stat() /home/wagnecaz/test.txt 1593085196.1130476 since last change 
1.0021603107452393
2020-06-25 13:39:57.115450 1593085197.11545
dir_entry.stat() /home/wagnecaz/test.txt 1593085197.1140487 since last change 
1.003014326095581

Change is done every 1s and detected in Linux in Windows it is stalled.
2020-06-25 13:39:58.117287 1593085198.117287
dir_entry.stat() /home/wagnecaz/test.txt 1593085198.11605 since last change 
1.002938985824585
2020-06-25 13:39:59.119224 1593085199.119224
dir_entry.stat() /home/wagnecaz/test.txt 1593085199.118051 since last change 
1.0027978420257568
2020-06-25 13:40:00.121166 1593085200.121166

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Steve Dower


Steve Dower  added the comment:

Does it make the most sense for us to make .flush() also do an implicit 
.fsync() (when it's actually a file object)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Eryk Sun


Eryk Sun  added the comment:

In FSBO [1] section 6 "Time Stamps", note that the LastWriteTime value gets 
updated when an IRP_MJ_FLUSH_BUFFERS is processed. In the Windows API, this is 
a FlushFileBuffers [2] call. In the C runtime, it's a _commit [3] call, which 
is an os.fsync [4] call in Python. Calling the latter will update the directory 
entry for the file. 

For an example implementation in the FAT32 filesystem, see 
FatCommonFlushBuffers [5]. Note in the UserFileOpen case that it flushes any 
cached data via FatFlushFile and then updates the directory entry from the file 
control block (FCB) via FatUpdateDirentFromFcb, and finally it  flushes the 
parent directory control blocks (DCBs) -- and possibly also the volume.

Example with os.fsync:

import os
import time
import datetime

UPDATE_DIR = True

FILEPATH = 'C:/Temp/test/spam.txt'

def scan(filepath):
dir_path, filename = os.path.split(filepath)
with os.scandir(dir_path) as iter_dir:
for entry in iter_dir:
if entry.name == filename:
return entry

with open(FILEPATH, 'w') as f:
while True:
print('spam', file=f, flush=True)
if UPDATE_DIR:
os.fsync(f.fileno())
entry = scan(FILEPATH)
stat_result = entry.stat()
now = datetime.datetime.now()
print(f'st_mtime: {stat_result.st_mtime:0.3f}, '
  f'delta_t: {now.timestamp() - stat_result.st_mtime:0.3f}')
time.sleep(1.0)


[1] https://go.microsoft.com/fwlink/?LinkId=140636
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-flushfilebuffers
[3] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/commit?view=vs-2019
[4] https://docs.python.org/3/library/os.html#os.fsync
[5] 
https://github.com/microsoft/Windows-driver-samples/blob/9afd93066dfd9db12f66099cf9ec44b6fd734b2d/filesys/fastfat/flush.c#L145

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Steve Dower


Steve Dower  added the comment:

I'm going to have to spend more time to analyse this (later), but it seems like 
Windows deciding not to update the directory's data structures (containing the 
st_mtime retrieved by scandir) as long as the file is still open.

I suspect the answer for your scenario is that you'll just have to use 
os.stat() to get the information from the file's entry, rather than the 
directory's entry. It's unlikely there's anything we can do at Python's level 
without sacrificing all the performance gains of scandir() for all other 
scenarios.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Cezary Wagner


Cezary Wagner  added the comment:

One hint more.

Start of new process os.scandir() give invalid modification date for file open 
for writing until external tool is not called (like explorer, touch, etc.).

So (log open for writing and write is done between 1, 2):
1. Run program with os.scandir() -> dir_entry.stat().st_mtime() = t1.
2. Run program with os.scandir() -> dir_entry.stat().st_mtime() = t1.
Modification is stalled.

Another scenario (log open for writing and write is done between 1, 3):
1. Run program with os.scandir() -> dir_entry.stat().st_mtime() = t1.
2. touch -> dir_entry.path
3. Run program with os.scandir() -> dir_entry.stat().st_mtime() = t2.
Modification works.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Cezary Wagner


Cezary Wagner  added the comment:

Extra file for for tests with:

DO_STAT = False

See not changes but file was writing every second. If os.stat() run all between 
call os.scandir() all works.

C:\root\Python38\python.exe 
C:/Users/Cezary.Wagner/PycharmProjects/dptr-monitoring-v2/sandbox/python/s13_dir_entry/s03_dir_entry.py
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
0.0009987354278564453
2020-06-24 18:57:52.911980 1593017872.91198
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
1.0078418254852295
2020-06-24 18:57:53.918823 1593017873.918823
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
2.0103507041931152
2020-06-24 18:57:54.921332 1593017874.921332
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
3.023340940475464
2020-06-24 18:57:55.934322 1593017875.934322
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
4.036783933639526
2020-06-24 18:57:56.947765 1593017876.947765
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
5.049667835235596
2020-06-24 18:57:57.960649 1593017877.960649
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
6.063947916030884
2020-06-24 18:57:58.974929 1593017878.974929
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
7.0797247886657715
2020-06-24 18:57:59.990706 1593017879.990706
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
8.091670751571655
2020-06-24 18:58:01.002652 1593017881.002652
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
9.1053147315979
2020-06-24 18:58:02.016296 1593017882.016296
dir_entry.stat() T:\\test.txt 1593017872.9109812 since last change 
10.120086908340454
2020-06-24 18:58:03.031068 1593017883.031068

--
Added file: https://bugs.python.org/file49260/s03_dir_entry.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41106] os.scandir() Windows bug dir_entry.stat() not works on file during writing.

2020-06-24 Thread Cezary Wagner


New submission from Cezary Wagner :

I have problem with change detection of log during writing under Windows 
(normal fs and windows share). Probably bad order of Windows API calls - no 
idea.

Test program is attached. You can reproduce it. Try with os.scandir() without 
os.stats() and os.stat().

Source code responsible for it is probably this -> I do not understand CPython 
code -> https://github.com/python/cpython/blob/master/Modules/posixmodule.c.

Here is full description - many test was done.

# os.scandir() Windows bug dir_entry.stat() not works on file during writing.
# Such files is for example application log.
# No problem with os.stat()

# Call of os.stat() before os.scandir() -> dir_entry.stat() is workaround.
# Open file during writing other program "fixes" dir_entry.stat().
# Get properties on open file during writing "fixes" dir_entry.stat().

# Notice that I run os.scandir() separately so dir_entry.stat() is not cached.

# Steps to reproduce lack of modification update:
# 1. Close all explorers or other application using PATH (it has impact).
# 2. Set PATH to test folder can be directory or windows share.
# 3. Run program without DO_STAT (False).
#
# Alternative steps (external app force valid modification date):
# 4. run 'touch' or 'echo' on file should "fix" problem. 'echo' will throw 
error not matter.
#
# Alternative scenario (os.stat() force valid modification date - very slow):
# 3. Run program without DO_STAT (True). No problems.
#
# Error result:
# Modification date from dir_entry.stat() is stalled (not changing after 
modification)
# if os.stat() or other Windows application not read file.
#
# Excepted result:
# Modification date from dir_entry.stat() is update from separate calls 
os.scandir()
# or cached if it is same os.scandir() call.
#
# Notice that os.scandir() must be call before dir_entry.stat() to avoid 
caching as described in documentation.
# And this is done but not work on files during writing..
#
# Ask question if you have since is very hard to find bug.

--
components: Windows
files: s03_dir_entry.py
messages: 372264
nosy: Cezary.Wagner, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.scandir() Windows bug dir_entry.stat() not works on file during 
writing.
type: crash
versions: Python 3.8
Added file: https://bugs.python.org/file49259/s03_dir_entry.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com