Eryk Sun <[email protected]> added the comment:
> os.path.realpath() normalizes paths before resolving links
> on Windows
Normalizing the input path is required in order to be consistent with the
Windows file API. OTOH, the target path of a relative symlink gets resolved in
a POSIX-ly correct manner in the kernel, and ntpath._readlink_deep() doesn't
ensure this.
I've attached a prototype that I wrote for a POSIX-like implementation that
recursively resolves both the drive and the path. It uses the final path only
as a shortcut to normalize volume GUID names as drives and the proper casing of
UNC server and share names. However, it's considerably more work than the
final-path approach, and more work always has the potential for more bugs. I'm
providing it for the sake of discussion, or just for people to point to it as
an example of what not to do... ;-)
Patching up the current implementation would probably involve extending
_getfinalpathname() to support follow_symlinks=False. Aspects of the POSIX
implementation would have to be adopted, but I think it can be kept relatively
simple when integrated with _getfinalpathname(path, follow_symlinks=False). The
latter also makes it easy to identify a UNC path, which is necessary because
mountpoints should never be resolved in a UNC path, which is something the
current implementation gets wrong.
What this wouldn't support is resolving an inaccessible drive as much as
possible. Mapped drives are object symlinks that expand to UNC paths that can
include an arbitrary filepath on a share. Substitute drives by definition
target an arbitrary filepath, and can even target other substitute and mapped
drives. A final-path only approach would leave the inaccessible drive in the
result, along with any symlinks that are internal to the drive.
A final-path approach also can't support targets with rooted paths or ".."
components that traverse a mountpoint. The final path will be on the
mountpoint's device, which will change how such relative symlinks resolve. That
said, rooted symlink targets are almost never seen in Windows, and targets that
traverse a mountpoint by way of a ".." component should be rare, in principle.
One problem is the frequent use of bind mountpoints in place of symlinks in
Windows. In CMD, bind mountpoints can be created by anyone via `mklink /j`.
Here's a fabricated example with a mountpoint (i.e. junction) that's used where
normally a symlink should be used.
C:\
work\
foo\
bar [junction -> C:\work\bar]
remote [symlink -> \\baz\spam]
bar\
remote [symlink -> ..\remote]
remote [symlink -> \\qux\eggs]
C:\work\foo\bar\remote normally resolves as follows:
C:\work\foo\bar\remote
-> C:\work\foo\bar + ..\remote
-> C:\work\foo\remote
-> \\baz\spam
Assume that \\baz\spam is down, so C:\work\foo\bar\remote can't be strictly
resolved. If the non-strict algorithm relies on getting the final path of
C:\work\foo\bar\remote before resolving the target of "remote", then the result
for this case will be incorrect.
C:\work\foo\bar\remote
-> C:\work\bar\remote
-> C:\work\bar + ..\remote
-> C:\work\remote
-> \\qux\eggs
----------
components: +Windows
nosy: +eryksun, paul.moore, steve.dower, tim.golden, zach.ware
versions: -Python 3.6, Python 3.7
Added file: https://bugs.python.org/file49984/realpath_posixly.py
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43936>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com