Eryk Sun <eryk...@gmail.com> added the comment:

> Junctions are sometimes used as links (e.g. mklink /j) and sometimes 
> as volume mount points (e.g. mountvol.exe).

That people sometimes use junctions as if they're symlinks doesn't mean that we 
should pretend it's true. The reparse tag is IO_REPARSE_TAG_MOUNT_POINT, and 
they behave like mountpoints (volume mounts and bind mounts). See Junctions vs. 
Symlinks, below.

It's potentially problematic to conflate junctions with symlinks. For example, 
a user who opts to use a junction instead of symlink may be denied the symlink 
privilege, so code that copies a junction as if it's a symlink will fail (e.g. 
move if os.rename fails, or copyfile with follow_symlinks=False, or copytree 
with symlinks=True), unless we add magical fallback code in os.symlink() to 
create junctions when the link target is a directory. Even if creating a 
symlink succeeds, a symlink has different behavior from that of a junction, 
which could lead to problems later on.

That said, always traversing directory mountpoints as if they're just plain 
directories, like what Unix does, is not the norm in Windows. In some contexts, 
they're basically handled as symlinks -- in particular for a recursive delete. 
CMD's `rmdir /s`, and PowerShell's `remove-item -recurse -force`, and 
Explorer's folder deletion all remove junctions without traversing them, 
regardless of whether the target is a regular DOS path or a volume GUID name. 
For example, if we step through the disassembled code of `rmdir /s` in CMD 
(i.e. cmd!RmDirSlashS), we observe that it looks for the name-surrogate bit in 
the reparse tag to determine whether it should call RemoveDirectoryW on a 
reparse-point directory instead of traversing it.

I would prefer to copy this behavior. It's safer, since standard users can 
create junctions to DOS paths and volume GUID names in Windows, unlike POSIX in 
which only the super user has the power to create mountpoints. While Windows 
mountvol.exe requires administrator access in order to update the mountpoint 
manager, CMD's `mklink /j` doesn't require elevated access, and neither does 
PowerShell's `new-item -itemtype junction`, even if the target is a volume GUID 
name.

Maybe for Windows we can have a name-surrogate category based on the reparse 
tag's name-surrogate bit (i.e. bit 29, "the file or directory represents 
another named entity in the system"), as identified by the WINAPI macro 
IsReparseTagNameSurrogate (winnt.h). The surrogate type would be a superset of 
the symlink type and would be allowed to be a directory. Nothing would change 
with regard to symlinks proper, however. It would remain the case that only 
IO_REPARSE_TAG_SYMLINK reparse points would be classified as symlinks by 
stat(), islink(), readlink(), etc. In POSIX systems, the only surrogate file 
type would be the symlink type, which is never a directory.

A keyword-only option surrogates_as_links=False could be added to stat() and 
lstat(). In POSIX, surrogates_as_links would be ignored. Given both 
follow_symlinks=False and surrogates_as_links=True, stat() would be able to 
return the reparse tag for any name-surrogate reparse point. The tag value 
could be added to _Py_stat_struct as st_reparse_tag, and the stat result tuple 
would be similarly extended. This field would be non-zero when querying any 
name-surrogate reparse point that's not followed. 

os.lstat(path, surrogates_as_links=True) could be the basis for 
os.path.issurrogate(). Or maybe we could add a more targeted function that 
calls CreateFileW and GetFileInformationByHandleEx: FileAttributeTagInfo, or 
FindFirstFileW. The scandir DirEntry result could implement an is_surrogate() 
method based on the reparse tag that's returned by FindFirstFileW.

For _rmtree_unsafe, we could simply insert a test at the start to avoiding 
listing surrogate directories. For example:

    if os.path.issurrogate(path):
        entries = []
    else:
        with os.scandir(path) as scandir_it:
            entries = list(scandir_it)

We could also add an allow_directory_surrogates=False keyword-only option to 
os.remove, which would be ignored in POSIX just as the symlink() 
target_is_directory option is ignored in POSIX. By default calling os.remove on 
a non-symlink directory would fail, as one expects it should. 

Adding an option to remove a directory via os.remove isn't strictly consistent 
with POSIX, but os.remove was already modified in issue 18314 to always remove 
all junctions, so the behavior is already inconsistent. We'd be clearly 
specifying and documenting how it works, and hopefully the new requirement to 
pass the keyword option wouldn't be too disruptive for programs that have 
relied on the undocumented behavior.

---
Junctions vs. Symlinks

Junctions and symlinks have different constraints and behavior. Junctions can 
only target local devices, and when accessed remotely by a client they're 
evaluated remotely on the server (e.g. if a client accesses a junction to 
"C:\Temp" on a server, the target is the system drive on the server). 

Symlinks are always evaluated on the client side, i.e. the redirector sends the 
reparse request over the wire to the client. The evaluation of local and remote 
symlinks is set by policies on the client system. A local symlink may be 
allowed to target either a local device or a remote device. A remote symlink 
may be allowed to target either a remote device or a local device on the client 
(e.g. a symlink to "C:\Temp" on the server targets the system drive on the 
client). The policies that govern this are SymlinkLocalToLocalEvaluation 
(default enabled), SymlinkLocalToRemoteEvaluation (default disabled?), 
SymlinkRemoteToLocalEvaluation (default disabled), and 
SymlinkRemoteToRemoteEvaluation (default disabled). You might see these 
abbreviated as L2L, L2R, R2L, and R2R. 

Junction targets must be fully qualified, but symlinks can target relative 
paths. How relative symlinks interact with junctions vs symlinks demonstrates 
that junctions are intentionally designed to behave as mountpoints. 

For example, given "C:\test1\test2\foo_link" is a link to "..\foo", if we have 
a directory symlink "C:\symlink" that targets "C:\test1\test2", then 
"C:\symlink\foo_link" refers to "C:\test1\foo". In contrast, relative symlinks 
traverse a junction as a namespace grafting. So if we have a junction 
"C:\junction" that targets "C:\test1\test2" (the same target as the symlink), 
then "C:\junction\foo_link" refers to "C:\foo". 

If we set up a similar scenario in Linux using either a kernel bind mount or 
FUSE bindfs mount, we'll observe the same behavior. The bind mount is a name 
grafting in the virtual filesystem, whereas a symlink simply resolves to the 
target path.

---
Mountpoints

It seems to me that handling all junctions as mountpoints is more consistent 
with how we handle DOS and UNC drives as mountpoints even when they're not 
volume mountpoints. For example, we can map a directory such as 
"C:\Users\Public" to drive "P:" or share it as "\\Server\Public". These are 
similar to Unix bind mounts, but in the case of DOS and UNC drives the 
namespace grafting is internal to the system, either as junctions in the system 
object namespace (e.g. "\Sessions\0\DosDevices\<Logon ID>\P:" -> 
"\Device\HarddiskVolume2\Users\Public") or as mappings in the UNC 
provider-share namespace (e.g. SMB shares, WebDAV shares, VirtualBox folder 
shares, and so on, all grafted under "\Device\Mup"). What's different about 
junction mountpoints is that they're not grafted as a root directory, whereas 
the syntax for DOS and UNC drives in Windows mandates that they're always the 
top-level root, i.e. we can't use ".." to traverse to a parent directory.

Given this broader definition of a mountpoint, os.path.ismount would no longer 
call _getvolumepathname. It would still return true for DOS and UNC drive root 
directories. Otherwise it would simply check whether the path is a junction 
(i.e. IO_REPARSE_TAG_MOUNT_POINT).

----------
keywords: +needs review -patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31226>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to