[issue40654] shutil.copyfile mutates symlink for absolute path

Eryk Sun Sat, 16 May 2020 23:26:28 -0700


Eryk Sun <eryk...@gmail.com> added the comment:

Copying a symlink verbatim requires copying both the print name and the
substitute name in the reparse buffer [1]. For a file, CopyFileExW:
COPY_FILE_COPY_SYMLINK implements this by enabling the symlink privilege for
the thread and copying the reparse point via FSCTL_GET_REPARSE_POINT and
FSCTL_SET_REPARSE_POINT. For a directory, CreateDirectoryExW is implemented
similarly when lpTemplateDirectory is a symlink or mount point. (For a
"\\\\?\\Volume{GUID}\\" volume mountpoint as opposed to a bind mountpoint,
CreateDirectoryExW punts to SetVolumeMountPointW, which also updates the system
mountpoint manager.)

If you can only have one or the other, the substitute name is more reliable
according to the wording in [MS-FSCC] [2].

symlinks:

A symbolic link has a substitute name and a print name associated
with it. The substitute name is a pathname (section 2.1.5)
identifying the target of the symbolic link. The print name SHOULD
be an informative pathname, suitable for display to a user, that
also identifies the target of the symbolic link. Either pathname
can contain dot directory names as specified in section 2.1.5.1.

mount points (junctions):

A mount point has a substitute name and a print name associated
with it. The substitute name is a pathname (section 2.1.5)
identifying the target of the mount point. The print name SHOULD
be an informative pathname (section 2.1.5), suitable for display
to a user, that also identifies the target of the mount point.
Neither of these pathnames can contain dot directory names.

The operative weasel word is "should", instead of a reliable "must" (RFC2119).

An example of the power of "should" is that PowerShell doesn't even set a print
name when it creates a mount point via `New-Item -ItemType Junction`. I don't
agree that nt.readlink should read junctions, but it does, so the potentially
missing print name is a problem. If it were just symlinks created by
CreateSymbolicLinkW, the print name is reliable because we know that it sets
the print name to whatever was passed as lpTargetFileName. I suppose
nt.readlink could fall back on using the substitute name if there's no print
name.

Also, if nt.readlink is used to manually resolve a broken path (e.g.
ntpath._readlink_deep), and the process doesn't have long paths enabled, then
the "\\?\" extended path from the substitute name is more reliable. (But one
could also call _getfullpathname on the print name and convert the result to
extended form if it's not already an extended path.)

If you search around, you'll find some projects using the print name and some
using the substitute name to implement POSIX readlink, but using the print name
is more popular.

Do you want 3.8 to revert to using the print name, at least for symlinks?
(ntpath._readlink_deep would need to be modified to support long target paths.)
Or would you rather that shutil used a more reliable way to copy symlinks
verbatim on Windows? For example, use CopyFileExW for a file and
CreateDirectoryEx for a directory.

[1]:
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_reparse_data_buffer
[2]:
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/b41f1cbf-10df-4a47-98d4-1c52a833d913

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40654>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40654] shutil.copyfile mutates symlink for absolute path

Reply via email to