On 10/19/20, Steve Dower <steve.do...@python.org> wrote:
>
> Resolving the path is the most expensive part, even if the file is not
> opened (I've been working with the NTFS team on this area, and we've
> been benchmarking/analysing all of it).

If you say it's been extensively benchmarked and there's no direct way
around the speed bottleneck, then I take your word for it. To clarify
what I had in mind, I was hoping that because NTFS implements the fast
I/O function FastIoQueryOpen [1] (via  NtfsNetworkOpenCreate, as given
by its FastIoDispatch table) that IRP_MJ_CREATE would be bypassed and
that the filesystem would not incur a significant cost to parse the
remaining path. I figured that most of the work would be in the
ObObjectObjectByName and IopParseDevice executive calls that lead up
to querying the filesystem.

Anyway, it's unfortunate that the Windows API doesn't support NT
handle-relative names, except in the registry API. If we could call
NTAPI NtQueryAttributesFile [2] directly, then the ObjectAttributes
argument could be relative to a directory handle set in the
RootDirectory field. That would eliminate the vast majority of the
path-resolution cost. A handle-relative open or query goes straight to
the filesystem device, which goes straight to the directory that
contains the file.

To eliminate the cost of opening the directory handle, scandir() could
be rewritten to use CreateFileW and GetFileInformationByHandleEx:
FileIdBothDirectoryInfo [3] instead of FindFirstFileW / FindNextFileW.
Just cache the directory handle in place of caching the find handle.
scandir() would gain fd support in Windows. Opening a directory via
os.open requires the flag _O_OBTAIN_DIR (0x2000), defined in fcntl.h.

FileIdBothDirectoryInfo provides the file ID, so the implementation
would support the inode() method without calling stat(). It would
still directly support is_dir() and is_file() based on the file
attributes, and is_symlink() based on the file attributes and the
EaSize field. The Windows Protocols document that the latter contains
the reparse tag for a reparse point. The field is reused because a
reparse point can't have extended attributes.

All that said, I don't prefer to call NtQueryAttributesFile or any
other NTAPI function in Windows Python. I'd rather do the best
possible with just the Windows API. I wish there were a new
GetFileAttributesExExW function that supported handle-relative names.
Even better would be a new function that calls
NtQueryInformationByName -- something like GetFileInformationByName --
for FileStatInfo (and FileCaseSensitiveInfo as well, which is becoming
more of an issue), also with support for handle-relative names.

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_fast_io_dispatch
[2] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile
[3] 
https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GODUIB5WKVZLX4BVPEM2NS37JFHUXIID/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to