[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-28 Thread Eryk Sun


Eryk Sun  added the comment:

Modifying readlink() to return an str subclass that preserves the print name is 
an interesting idea, but not on topic here. 

For the cases where os.readlink is used to manually follow links (*), such as 
ntpath.realpath, using the substitute name is the most reliable option since 
that's the actual path that the system uses. But this issue is about the use of 
os.readlink in order to copy symlinks in shutil.move, shutil.copyfile, and 
shutil.copytree. I'd still be happy to assist with the development of an 
os.copylink function that copies the reparse point exactly via low-level 
FSCTL_GET_REPARSE_POINT and FSCTL_SET_REPARSE_POINT. It has to use the 
low-level API because it turns out that CopyFileExW and CreateDirectoryExW fail 
if they can't enable the symlink privilege, which is not actually required if 
the system is in developer mode. For shutil, it would be used as 
shutil._copylink. In POSIX, shutil._copylink would continue to just use 
readlink and symlink.

---

(*) off-topic note

Manually following remote mountpoints is never correct. They are intended to be 
evaluated by the remote system using its local devices. 

Manually following remote-to-local (R2L) symlinks is almost always incorrect 
and should be disallowed by local symlink evaluation policy Check `fsutil 
behavior query symlinkevaluation`. A remote SMB server opens a path in a way 
that has the remote I/O manager stop parsing at the first symlink, which may be 
a directory symlink in the path (it's not necessarily the final component, 
unlike FILE_FLAG_OPEN_REPARSE_POINT). The server completes the request with a 
message to the client redirector that contains the parsed path of the symlink, 
the remaining unparsed path, and the target of the symlink. The local SMB 
redirector decides whether to allow reparsing based on its L2L, L2R, R2L, and 
R2R symlink policies. This reparse is implemented locally, with local devices. 
It is unlikely that a non-relative symlink that targets local device paths 
(e.g. "//server/share/symlink" -> "E:/work/file") was intended to be evaluated 
on another machine, so R2L should almost always be disallowed. Bypassing 
 the system's R2L policy when manually following a symlink is always wrong.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-26 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

What do you think about readlink returning something like:

class Link(str):
  print_name = None  # type: str | None

  @property
  def friendly_name(self) -> str:
return self.print_name or self

os.readlink would always return one of these Link objects, and since it's a 
substr, it would behave the way it currently behaves, but with an additional 
property. This approach would expose the underlying duality of the "name" of a 
link or junction, such that tools that need a friendly name could use 
.friendly_name and tools that need to inspect the print name could do that... 
but for the cases that wish for the most precise target could use the default 
value.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-26 Thread Steve Dower


Steve Dower  added the comment:

> how could one write the `cmd.exe` `dir` command using Python?

I haven't checked, but if the dir command resolves the entire symlink chain 
rather than one step at a time, you'd use realpath (though that seems unlikely, 
tbh).

Failing that, you'd add extra logic to prepare the path for display to the 
user, and then you _would not_ use that prepared display path to resolve the 
link.

readlink() is intended for resolving links, not for displaying links. That's 
why it doesn't prepare them for display, but preserves the most accurate 
information possible. (We chose a more pragmatic route with realpath(), by 
checking that the nicer path resolves to the same target as the accurate one, 
and only then returning the nicer one.)

Really, for dir, you'd read the print name and display it when it's in the 
contents of the target. But you'd use the substitute name to resolve the target 
when "dir " is called. We _could_ add a new API to Python for this, 
but it wouldn't be cross-platform and it doesn't solve a correctness issue, so 
we didn't. Feel free to request it, bearing in mind that it would be 
Windows-only (AFAIK).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-26 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Thanks Steve. While I was able to avoid the original symptom by not using 
readlink, I still think there's an issue here, both in the surprising behavior 
observed by shutil.copyfile, but also by the essential behavior of readlink. 
The more essential issue can be illustrated this simple problem: how could one 
write the `cmd.exe` `dir` command using Python? Using cmd.exe:

```
C:\Users\jaraco>mklink foo C:\USERS\jaraco\bar
symbolic link created for foo <<===>> C:\USERS\jaraco\bar

C:\Users\jaraco>dir foo
 Volume in drive C has no label.
 Volume Serial Number is B8F4-40BB

 Directory of C:\Users\jaraco

2020-05-26  02:04 AM  foo [C:\USERS\jaraco\bar]
   1 File(s)  0 bytes
   0 Dir(s)  21,078,786,048 bytes free
```

Similarly in powershell:

```
PS C:\Users\jaraco> cmd /c mklink foo c:\users\jaraco\bar
symbolic link created for foo <<===>> c:\users\jaraco\bar
PS C:\Users\jaraco> dir 'foo' | ?{$_.LinkType} | select Target

Target
--
{c:\users\jaraco\bar}
```

Whether 'bar' exists or not, it seems to me that these tools use the "print 
name" to render the next hop of the link to the user (even if that target 
doesn't exist).

`realpath` doesn't provide this functionality and neither does `readlink`.

Perhaps more importantly, the "print name" is lost when copying the file using 
shutil, but for the same underlying reason--the print name is not exposed by 
the low level API, and the result is that implementation details about the 
platform leak out to the user interface.

If "\\?\C:\Users\jaraco\bar" is the more correct form of the target, then 
shouldn't these tools also be exposing that value to the user? How would a 
Python developer wishing to implement the dir command do so? Is there a way to 
retain the fidelity of the print name during an shutil.copy?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-25 Thread Steve Dower


Steve Dower  added the comment:

I think we can safely say this is by design (I know Jason got his backport 
working).

> Understood. However, this statement assumes the "correct path" is the most 
> precise path to resolve the target. If you instead define "correct path" as 
> the one that would be most friendly to the user who created the path, 
> readlink no longer honors that expectation.

Nothing about the os module is meant to be user-friendly first - it's based on 
the POSIX spec ;)

The most important thing is that operations that traverse symlinks should end 
up at the same file as a manual traversal using readlink. The easiest way to 
spoil this is to optimise for readability over correctness.

As discussed, realpath does a little more work to ensure readability, and 
anything else that cares about UI can do similar work. But if the lowest-level 
function loses critical information, there's no way for the developer to get it 
back.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-22 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

> Perhaps related...

Now I'm thinking the issue is different, so I've created issue40732 to track 
the realpath issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-22 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

> I'll see if `realpath` satisfies the test suite needs for path pie.

I've tried replacing `readlink` with `realpath` and the tests still pass on 
Unix-like OSs, and also passes on Python 3.8 on Windows, but now fails on older 
Pythons on Windows.

Is there a backport of the new realpath for older Pythons? I see there's [one 
here](https://github.com/jaraco/jaraco.windows/blob/1318d7afce2a9257f5bd7342783fdb796462d66b/jaraco/windows/filesystem/backports.py#L7).
 I wonder if that one can be used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-22 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Perhaps related, I've encountered another apparent regression on Python 3.8 on 
Windows when the current working directory is in a symlink to another volume 
and one runs `setup.py develop` on a project using setuptools_scm 
(https://github.com/pypa/setuptools_scm/issues/436).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-22 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

> Unless you're specifically testing single steps through symlink chains, you 
> probably want to just use realpath anyway.

I do see now the references to `realpath` in the docs... and I think that 
satisfies the need I described above. It doesn't supply the `f`, but it 
provides a mechanism to resolve the ultimate file. As you say, there's still no 
regular solution for resolving intermediate links in a chain of symlinks with 
user-friendly names, but it seems like a much less severe limitation (platform 
variation).

I'll see if `realpath` satisfies the test suite needs for path pie.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-22 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

> Changing readlink to always return the correct path was deliberate.

Understood. However, this statement assumes the "correct path" is the most 
precise path to resolve the target. If you instead define "correct path" as the 
one that would be most friendly to the user who created the path, readlink no 
longer honors that expectation. With this change, the following invariant holds 
on every platform except Python 3.8 on Windows (at least in the general case):

>>> os.symlink(x, y)
>>> assert os.readlink(y) == x

More importantly, AFAIK, Python provides no function to transform `x` into what 
one can expect as the result of `os.readlink(y)`. In other words, what value of 
`f` would make this invariant pass?

>>> os.symlink(x, y)
>>> assert os.readlink(y) == f(x)

Or put another way, is "C:\Users\jaraco\temp" an "incorrect path"?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-18 Thread Steve Dower


Steve Dower  added the comment:

> Presumably, I'll be able to add a pre-release of Python 3.9 to the pipelines 
> config at jaraco/skeleton - that will ensure that all projects I maintain 
> going forward will get tested on the pre-release.

I don't have a straightforward task for it, but these should give you a Python 
3.9 prerelease version (or any version you like if you replace "-Prerelease" 
with "-Version x.y.z", IIRC):

nuget install python -Prerelease -OutputDirectory . -ExcludeVersion
.\python\tools\python.exe ...

nuget install pythonx86 -Prerelease -OutputDirectory . -ExcludeVersion
.\pythonx86\tools\python.exe ...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-18 Thread Steve Dower


Steve Dower  added the comment:

Ah right, it's realpath() that has the final step to remove an unnecessary \\?\ 
prefix.

Firstly, I still think fixing copy is more important and more valuable.

Given the possibility that the print name won't point to the real target, I'd 
much prefer to keep reading the substitute name. In this case, this looks like 
a change between versions (3.7->3.8) rather than a bug (assuming the symlink 
works, which hasn't been mentioned yet), and while it does result in needing a 
test update in Path, I don't think it makes sense to push that fix upstream. 
Changing readlink to always return the correct path was deliberate.

Unless you're specifically testing single steps through symlink chains, you 
probably want to just use realpath anyway. Presumably you weren't using that 
beforehand because it didn't work, but the same fix fixed that, and we do extra 
work to retain the existing behaviour of the higher-level function (including 
rejecting junctions, unlike readlink).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-18 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

> This is only an issue for broken symlinks, right? (More precisely, those that 
> cannot be resolved, which may include very long paths on some machines.) 

Unfortunately, no. In the original post above, you can see temp/bar points to 
C:\Users\jaraco\temp\foo, which exists. Yet os.readlink returns 
\\?\C:\Users\jaraco\temp\foo. That's the root cause of the issue with copytree.

> what do you need to be able to test against prerelease? It would be nice to 
> find these impacts during alpha/beta rather than three bug fix releases later.

I need a mechanically-reproducible technique to test dozens of projects against 
pre-releases on Windows. Historically, I've had Travis as my prime CI 
(non-existent Windows support) and AppVeyor for select projects that demanded 
some Windows testing. I've recently started migrating to Azure Pipelines, which 
has nice multi-platform support and that's why I've started catching these 
issues. Presumably, I'll be able to add a pre-release of Python 3.9 to the 
pipelines config at jaraco/skeleton - that will ensure that all projects I 
maintain going forward will get tested on the pre-release.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-18 Thread Steve Dower


Steve Dower  added the comment:

This is only an issue for broken symlinks, right? (More precisely, those that 
cannot be resolved, which may include very long paths on some machines.) 

Fixing copy is my preferred option.

Also Jason, what do you need to be able to test against prerelease? It would be 
nice to find these impacts during alpha/beta rather than three bug fix releases 
later.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-17 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

I strongly suspect bpo-37834 and GH-15231 is where the difference was 
introduced.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-17 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Thank you Eryk for the thorough and informative investigation. Seriously, wow.

> Do you want 3.8 to revert to using the print name, at least for symlinks? 
> (ntpath._readlink_deep would need to be modified to support long target 
> paths.) Or would you rather that shutil used a more reliable way to copy 
> symlinks verbatim on Windows? For example, use CopyFileExW for a file and 
> CreateDirectoryEx for a directory.

In this case, my instinct is that `shutil` should devise a more reliable way to 
copy symlinks. It claims that it does to the extent the platform allows [1].

> symbolic links in the source tree are represented as symbolic links in the 
> new tree and the metadata of the original links will be copied as far as the 
> platform allows

It hadn't occurred to me that the effect may be manifest in readlink, but I see 
that now too:

~ # py -3.7 -c "import os; print(os.readlink('temp/bar'))"
c:\Users\jaraco\temp\foo
~ # py -3.8 -c "import os; print(os.readlink('temp/bar'))"
\\?\c:\Users\jaraco\temp\foo

So even if shutil.copyfile were to copy the symlinks exactly, the expectation 
would still be missed that the output of readlink doesn't match the input to 
symlink, so the expectation would still be missed. A user/programmer should be 
able to predict the output of 'readlink' given the input to 'symlink' (or cmd 
/c mklink).

Based on that reasoning, my answer would be "both", but I'd put a priority on 
restoring the use of "print name" for `readlink`, as that may be sufficient to 
satisfy most (if not all) use-cases.


[1]: https://docs.python.org/3/library/shutil.html#shutil.copytree

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-17 Thread Eryk Sun


Eryk Sun  added the comment:

Copying a symlink verbatim requires copying both the print name and the 
substitute name in the reparse buffer [1]. For a file, CopyFileExW: 
COPY_FILE_COPY_SYMLINK implements this by enabling the symlink privilege for 
the thread and copying the reparse point via FSCTL_GET_REPARSE_POINT and 
FSCTL_SET_REPARSE_POINT. For a directory, CreateDirectoryExW is implemented 
similarly when lpTemplateDirectory is a symlink or mount point. (For a 
"?\\Volume{GUID}\\" volume mountpoint as opposed to a bind mountpoint, 
CreateDirectoryExW punts to SetVolumeMountPointW, which also updates the system 
mountpoint manager.)

If you can only have one or the other, the substitute name is more reliable 
according to the wording in [MS-FSCC] [2]. 

symlinks:

A symbolic link has a substitute name and a print name associated 
with it. The substitute name is a pathname (section 2.1.5) 
identifying the target of the symbolic link. The print name SHOULD 
be an informative pathname, suitable for display to a user, that 
also identifies the target of the symbolic link. Either pathname 
can contain dot directory names as specified in section 2.1.5.1. 

mount points (junctions):

A mount point has a substitute name and a print name associated 
with it. The substitute name is a pathname (section 2.1.5) 
identifying the target of the mount point. The print name SHOULD 
be an informative pathname (section 2.1.5), suitable for display 
to a user, that also identifies the target of the mount point. 
Neither of these pathnames can contain dot directory names.

The operative weasel word is "should", instead of a reliable "must" (RFC2119). 

An example of the power of "should" is that PowerShell doesn't even set a print 
name when it creates a mount point via `New-Item -ItemType Junction`. I don't 
agree that nt.readlink should read junctions, but it does, so the potentially 
missing print name is a problem. If it were just symlinks created by 
CreateSymbolicLinkW, the print name is reliable because we know that it sets 
the print name to whatever was passed as lpTargetFileName. I suppose 
nt.readlink could fall back on using the substitute name if there's no print 
name.

Also, if nt.readlink is used to manually resolve a broken path (e.g. 
ntpath._readlink_deep), and the process doesn't have long paths enabled, then 
the "\\?\" extended path from the substitute name is more reliable. (But one 
could also call _getfullpathname on the print name and convert the result to 
extended form if it's not already an extended path.)

If you search around, you'll find some projects using the print name and some 
using the substitute name to implement POSIX readlink, but using the print name 
is more popular.

Do you want 3.8 to revert to using the print name, at least for symlinks? 
(ntpath._readlink_deep would need to be modified to support long target paths.) 
Or would you rather that shutil used a more reliable way to copy symlinks 
verbatim on Windows? For example, use CopyFileExW for a file and 
CreateDirectoryEx for a directory.

[1]: 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_reparse_data_buffer
[2]: 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/b41f1cbf-10df-4a47-98d4-1c52a833d913

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40654] shutil.copyfile mutates symlink for absolute path

2020-05-16 Thread Jason R. Coombs


New submission from Jason R. Coombs :

In https://github.com/jaraco/path/issues/186, the Path project discovered a 
regression with Python 3.8. It seems that if one creates a symlink with an 
absolute path. I used `shutil.copytree('temp', 'temp2', True)` and it produced 
this result:

```
~ # dir temp
 Volume in drive C has no label.
 Volume Serial Number is B8F4-40BB

 Directory of C:\Users\jaraco\temp

2020-05-16  11:05 PM  .
2020-05-16  11:05 PM  ..
2020-05-16  11:05 PM  bar [c:\Users\jaraco\temp\foo]
2020-05-16  11:04 PM 0 foo
   2 File(s)  0 bytes
   2 Dir(s)  17,495,805,952 bytes free
~ # dir temp2
 Volume in drive C has no label.
 Volume Serial Number is B8F4-40BB

 Directory of C:\Users\jaraco\temp2

2020-05-16  11:05 PM  .
2020-05-16  11:05 PM  ..
2020-05-16  11:06 PM  bar [\\?\c:\Users\jaraco\temp\foo]
2020-05-16  11:04 PM 0 foo
   2 File(s)  0 bytes
   2 Dir(s)  17,495,846,912 bytes free
```

As you can see, in the copy, bar has an additional `\\?\` prefix on the symlink 
path. On Python 3.7 and earlier, the copy was made without mutating the 
metadata.

--
components: Windows
keywords: 3.8regression
messages: 369091
nosy: jaraco, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: shutil.copyfile mutates symlink for absolute path
type: behavior
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com