Eryk Sun <[email protected]> added the comment:
I'm tentatively reopening this issue for you to consider the following point,
Steve.
A real path is not always the same as a final path. We can find code that does
`relpath(realpath(target), realpath(start))` to compute the relative path to
target for a symlink. The final path can't be relied on for this unless we
always evaluate the symlink from the final path to `start`. In particular, it
cannot be relied on if the relative path traverses a junction.
What code like this needs from a realpath() implementation is a solid (real)
path, not a final path. In other words, the caller wants a solidified form of
`start` that can be used to compute the path to a target for a relative
symlink, but one that works when accessed from `start`, not the final path of
`start`. Generally this means resolving symlinks in the path, but not mount
points. That's what Unix realpath() does, but of course there it's simpler
because the only name surrogate in Unix is a symlink, which is never a mount
point and never a directory.
Here's an example. In this first case "scripts" is a junction mount point that
targets "C:/spam/etc/scripts":
>>> eggs = r'C:\spam\dlls\eggs.dll'
>>> scripts = r'C:\spam\scripts'
>>> rel_eggs_right = os.path.relpath(eggs, scripts)
>>> print(rel_eggs_right)
..\dlls\eggs.dll
>>> os.symlink(rel_eggs_right, 'C:/spam/scripts/eggs_right.dll')
>>> os.path.exists('C:/spam/scripts/eggs_right.dll')
True
>>> scripts_final = os.path._getfinalpathname(scripts)[4:]
>>> print(scripts_final)
C:\spam\etc\scripts
>>> rel_eggs_wrong = os.path.relpath(eggs, scripts_final)
>>> print(rel_eggs_wrong)
..\..\dlls\eggs.dll
>>> os.symlink(rel_eggs_wrong, 'C:/spam/scripts/eggs_wrong.dll')
>>> os.path.exists('C:/spam/scripts/eggs_wrong.dll')
False
If we remove the junction and replace it with a 'soft' symlink that targets the
same directory, then using the final path works, and using the given path no
longer works.
>>> print(os.readlink('C:/spam/scripts'))
C:\spam\etc\scripts
>>> scripts_final = os.path._getfinalpathname(scripts)[4:]
>>> rel_eggs_right_2 = os.path.relpath(eggs, scripts_final)
>>> print(rel_eggs_right_2)
..\..\dlls\eggs.dll
>>> os.symlink(rel_eggs_right_2, 'C:/spam/scripts/eggs_right_2.dll')
>>> os.path.exists('C:/spam/scripts/eggs_right_2.dll')
True
>>> rel_eggs_wrong_2 = os.path.relpath(eggs, scripts)
>>> print(rel_eggs_wrong_2)
..\dlls\eggs.dll
>>> os.symlink(rel_eggs_wrong_2, 'C:/spam/scripts/eggs_wrong_2.dll')
>>> os.path.exists('C:/spam/scripts/eggs_wrong_2.dll')
False
When the kernel traverses "scripts" as a soft link, it collapses to the target
(i.e. "C:/spam/etc/scripts"), so our relative path that was computed from the
final path is right in this case. On the other hand, if "scripts" is is a mount
point (junction), it's a hard (solid) component. It does not collapse to the
target (the kernel even checks the junction's security descriptor, which it
does not do for a symlink), so ".." in the relative symlink traverses the
junction component as if it were an actual directory.
What we need is an implementation of realpath("C:/spam/scripts") that returns
"C:\\spam\\scripts" when "scripts" is a mount point and returns
"C:\\spam\\etc\\scripts" when "scripts" is a symlink.
This means we need an implementation of realpath() that looks a lot like
posixpath.realpath. Generally a mount point should be walked over like a
directory, just as mount points are handled in Unix. The difference is that a
mount point in Windows is allowed to target a symlink. (This is a design flaw;
Unix doesn't allow it.) Since we need to know the target of a junction, we have
to read the reparse point, until we hit a real directory target. As long as it
targets another junction, it remains a hard component. As soon as it targets a
symlink, however, it becomes a soft component that needs to be resolved. If the
junction targets a name surrogate reparse point that we can't read, then our
only option is to get a final path. This is dysfunctional. We should raise an
exception for this case. Code can handle the exception and knowingly get a
final path instead of a real path.
This also means we can't reliably compute a real path for a remote path (UNC)
because we can't manually evaluate the target of a remote junction. A remote
junction is meaningless to us. If we're evaluating a UNC path and reach a
junction, we have to give up on a real path and settle for a final path. We can
get a final path because that lets the kernel in the server talk to our kernel
to resolve any combination of mount points (handled on the server side) and
symlinks (handled on our side). This case should also raise an exception. Aware
code can handle it by getting a real path and taking appropriate measures.
----------
status: closed -> open
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue9949>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com