Eryk Sun <[email protected]> added the comment:
Here are two additional differences between mount points and symlinks:
(1) A mount point in a remote path is always evaluated on the server and
restricted to devices that are local to the server. So if we handle a mount
point as if it's a POSIX symlink that works with readlink(), then what are we
to do with the server's drive "Z:"? Genuine symlinks are evaluated on the
client, so readlink() always makes sense. (Though if we resolve a symlink
manually, then we're bypassing the system's R2L symlink policy.)
(2) A mount point has its own security that's checked in addition to the
security on the target directory when it's reparsed. In contrast, security set
on a symlink is not checked when the link is reparsed, which is why icacls.exe
implicitly resolves a symlink when setting and viewing security unless the /L
option is used.
> - if it's a directory junction, call os.stat instead and return that > (???)
I wanted lstat in Windows to traverse mount points by default (but I gave up on
this), as it does in Unix, because a mount point behaves like a hard name
grafting in a path. This is important for relative symlinks that use ".."
components to traverse above their parent directory. The result is different
from a directory symlink that targets the same path.
A counter-argument (in favor of winlinks) is that a mount point is still
ultimately a name-surrogate reparse point, so, unlike a hard link, its
existence doesn't prevent the directory from being deleted. It's left in place
as a dangling link if the target is deleted or the device is removed from the
system. Trying to follow it fails with ERROR_PATH_NOT_FOUND or
ERROR_FILE_NOT_FOUND.
Also, handling a mount point as a directory by default would require an
additional parameter because in some cases we need to be able to open a
junction instead of traversing it, such as to implement shutil.rmtree to behave
like CMD's `rmdir /s`.
Another place identifying a mount point is required, unfortunately, is in
realpath(). Ideally we would be able to handle mount points as just
directories. The problem is that NT allows a mount point to target a symlink,
something that's not allowed in Unix. Traversing the mount point is effectively
the same as traversing the symlink. So we have to read the mount-point target,
and if it's a symlink, we have to read and evaluate it. (Consequently it seems
that getting the real path for a remote path is an intractable problem when
mount points are involved. We can only get the final path.)
---
Even without the addition of a new parameter, we may still want to limit the
definition of 'link' in Windows lstat to name-surrogate reparse points, i.e.
winlinks. Reparse points that aren't name surrogates don't behave like links.
They behave like the file itself, and reparsing may automatically replace the
reparse point with the real file. Some of them are even directories that have
the directory bit (28) set in the tag value, which means they're allowed to
contain other files. (Without the directory tag bit, setting a reparse point on
a non-empty directory should fail.)
The counter-argument to changing lstat to only open winlinks is that changing
the meaning of 'link' in lstat is too disruptive to existing software that may
depend on the old behavior, i.e. opening any reparse point. I think the use
cases for opening non-links are rare enough that it's not beyond the pale to
change this behavior in 3.8 or 3.9.
> Right, but is that because they deliberately want the junction
> to be treated like a file? Or because they want it to be treated
> like the directory is really right there?
For copytree it makes sense to traverse a mount point as a directory. We can't
reliably copy a mount point. In Unix, even when a volume mount or bind mount
can be detected, there's no standard way to clone it to a new mount point, and
even if there were, that would require super-user access. In Windows, we could
wrap CreateDirectorExW, which can copy a mount point, but it requires
administrator access to copy a volume mount point (i.e.
"\\\\?\\Volume{...}\\"), for which it calls SetVolumeMountPointW in order to
update the mount-point manager in the kernel.
We also have a limited ability to create mount points via
_winapi.CreateJunction, but it's buggy in corner cases and incomplete. It
suffices for the reason it was added -- testing the ability to delete a
junction via os.remove().
> os.rmdir() already does special things to behave like a junction
> rather than the real directory,
This is similar in spirit to Unix, except Unix refuses to delete a mount point.
For example, if we have a Unix bind mount to a non-empty directory, rmdir()
fails with EBUSY. On the other hand, rmdir() on the real directory fails with
ENOTEMPTY. If Unix handled the mount point as if it's just the mounted
directory, I'd expect the error to be the same.
It's not particularly special in Windows unless it's a volume mount point. Then
RemoveDirectoryW tries to call DeleteVolumeMountPointW. This could be a case
where it would fail to remove a mount point, just like Unix. But the internal
DeleteVolumeMountPointW call is allowed to fail if the caller doesn't have
access to update the mount-point manager, in which case it removes the junction
anyway.
The consequence of failing to update the mount-point manager is that
GetFinalPathNameByHandleW calls will subsequently return a non-existing path
for a volume that was mounted only in the deleted folder (i.e. the volume isn't
also assigned a drive letter). Thus we can't assume the result from
GetFinalPathNameByHandleW exists. This just pertains to volume mount points,
which are special to the mount-point manager because it uses them to translate
a native device path into a canonical DOS path. Bind mount points have no
special significance to the mount-point manager.
> the islink/readlink/symlink process is going to be problematic on
> Windows since most users can't create symlinks.
Then copying the symlink fails, which I think is better than silently
transforming the behavior from a mount point to a symlink. Defensive code can
fall back on physically copying the target file or directory.
The latter is the default behavior for copytree. It's only an issue if code
calls copytree(src, dst, symlinks=True).
However, it's always a concern with shutil.move(), which attempts to move a
file via os.rename. This fails for a cross-volume rename. Then if islink() is
true, it falls back on os.symlink(os.readlink(src), real_dst) and
os.unlink(src).
(On my own systems, I grant the symlink privilege to the Authenticated Users
group, which allows symlink creation by standard users and administrators --
elevated or not. But in general, a fear of symlinks is warranted, even in Unix.)
> I'm proposing to fix the inconsistency by fixing the flags. Your
> proposal is to fix the inconsistency by generating a new error in
> unlink()? (Just clarifying.)
unlink() didn't used to remove junctions prior to 3.5 (see issue 18314).
Instead of rolling back the change, or conflating the meaning of S_IFLNK, a
counter-proposal is to harmonize unlink with the proposed change to lstat, i.e.
to allow removing all name-surrogate directories. A name-surrogate directory
cannot have children in the directory itself, so allowing it for os.unlink is
in the spirit of the function, even if doing so is inconsistent with the
literal specification.
This is documented in ntifs.h:
D [bit 28] is the directory bit. When set to 1, indicates that any
directory with this reparse tag can have children. Has no special
meaning when used on a non-directory file. Not compatible with the
name surrogate bit [bit 29].
Regarding the directory bit, the registered tags with this bit are
IO_REPARSE_TAG_CLOUD*, IO_REPARSE_TAG_WCI_1, and IO_REPARSE_TAG_PROJFS (for
projected file systems).
> Currently Windows shutil.rmtree traverses into junctions and deletes
> everything, though it then succeeds to delete the junction.
That's like Unix mount-point behavior, except Windows allows a volume mount
point to be deleted (not just a bind mount point), despite negative
consequences to API functions such as GetFinalPathNameByHandleW if the user
isn't allowed to update the system database of volume mount points.
An issue here, and with all code that walks a tree (especially destructively),
is the link behavior of mount points. Bind mount points have the same problem
in both Unix and Windows. For example, shutil.rmtree will fail to remove a
mount point that targets a directory that it already removed. It's a different
OSError in Unix vs Windows (EBUSY vs ENOENT or ERROR_PATH_NOT_FOUND), but an
error all the same. That in itself is not an argument to handle a junction as a
symlink, because it's still a mount point that behaves as such, even if someone
is using it as a symlink. However, it is an argument for special handling of
winlinks, which would allow the Windows implementation to behave better than
Unix, IMO, in addition to helping Windows users that are forced to use mount
points instead of symlinks.
> With my change, rmtree() directly on a junction now raises (could be
> fixed?) but rmtree on a directory containing a junction will remove
> the junction without touching the target directory. So I think we're
> both happy about this one.
Changing rmtree to work on a target directory that claims to be a symlink would
require special casing Windows in shutil.rmtree. But in general this is a
problem that affects all code that looks for symlinks, not just code in the
standard library.
If the meaning of S_IFLNK remains the same, then existing code has the option
of being upgraded to delete directory winlinks without traversing them, but
nothing is forced on them. In this case, for example, we could wrap the
os.scandir call:
if not _WINDOWS:
_rmtree_unsafe_scandir = os.scandir
else:
import contextlib
def _rmtree_unsafe_scandir(path):
try:
st = os.lstat(path)
attr, tag = st.st_file_attributes, st.st_reparse_tag
except OSError:
attr = tag = 0
if (attr & stat.FILE_ATTRIBUTE_DIRECTORY
and attr & stat.FILE_ATTRIBUTE_REPARSE_POINT
and tag & 0x2000_0000): # IsReparseTagNameSurrogate
return contextlib.nullcontext([])
else:
return os.scandir(path)
For a directory winlink, the above _rmtree_unsafe_scandir function returns a
context manager that yields an empty list, so _rmtree_unsafe skips to
os.rmdir(path). This reproduces the behavior of CMD's `rmdir /s`, which will
not traverse any name-surrogate reparse point (it checks the tag for the
name-surrogate bit) even if the reparse point is the target directory.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue37834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com