On 10/19/20, Steve Dower <steve.do...@python.org> wrote: > > Resolving the path is the most expensive part, even if the file is not > opened (I've been working with the NTFS team on this area, and we've > been benchmarking/analysing all of it).
If you say it's been extensively benchmarked and there's no direct way around the speed bottleneck, then I take your word for it. To clarify what I had in mind, I was hoping that because NTFS implements the fast I/O function FastIoQueryOpen [1] (via NtfsNetworkOpenCreate, as given by its FastIoDispatch table) that IRP_MJ_CREATE would be bypassed and that the filesystem would not incur a significant cost to parse the remaining path. I figured that most of the work would be in the ObObjectObjectByName and IopParseDevice executive calls that lead up to querying the filesystem. Anyway, it's unfortunate that the Windows API doesn't support NT handle-relative names, except in the registry API. If we could call NTAPI NtQueryAttributesFile [2] directly, then the ObjectAttributes argument could be relative to a directory handle set in the RootDirectory field. That would eliminate the vast majority of the path-resolution cost. A handle-relative open or query goes straight to the filesystem device, which goes straight to the directory that contains the file. To eliminate the cost of opening the directory handle, scandir() could be rewritten to use CreateFileW and GetFileInformationByHandleEx: FileIdBothDirectoryInfo [3] instead of FindFirstFileW / FindNextFileW. Just cache the directory handle in place of caching the find handle. scandir() would gain fd support in Windows. Opening a directory via os.open requires the flag _O_OBTAIN_DIR (0x2000), defined in fcntl.h. FileIdBothDirectoryInfo provides the file ID, so the implementation would support the inode() method without calling stat(). It would still directly support is_dir() and is_file() based on the file attributes, and is_symlink() based on the file attributes and the EaSize field. The Windows Protocols document that the latter contains the reparse tag for a reparse point. The field is reused because a reparse point can't have extended attributes. All that said, I don't prefer to call NtQueryAttributesFile or any other NTAPI function in Windows Python. I'd rather do the best possible with just the Windows API. I wish there were a new GetFileAttributesExExW function that supported handle-relative names. Even better would be a new function that calls NtQueryInformationByName -- something like GetFileInformationByName -- for FileStatInfo (and FileCaseSensitiveInfo as well, which is becoming more of an issue), also with support for handle-relative names. [1] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_fast_io_dispatch [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile [3] https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GODUIB5WKVZLX4BVPEM2NS37JFHUXIID/ Code of Conduct: http://python.org/psf/codeofconduct/