[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-09-10 Thread Steve Dower


Steve Dower  added the comment:

Closing this as we've resolved it elsewhere. Thanks for the contribution!

--
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-08-21 Thread Steve Dower


Steve Dower  added the comment:

I think this definitely conflicts with the behaviour we've been working on for 
the last week over on issue37834 (the PR is going to conflict for sure).

I am making a change to rmtree() that will cause it to delete junction point 
without trying to recursively remove their contents, which should avoid the 
listdir() that was failing here. That should be enough to cover the original 
issue.

We also ended up settling on isdir/islink semantics for directory junctions 
that don't quite match those discussed here. I'm inclined to agree that we 
should figure out the best way to distinguish volume mounts from directory 
junctions, but that is going to be best layered on top of the other changes 
that I'm working on getting merged right now. 

The commit messages or doc changes in PR 15231 are the best place to see where 
we ended up.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-08-09 Thread Eryk Sun


Eryk Sun  added the comment:

> Thanks for the detailed explanation Eryk. While it is a little 
> annoying that it comes 2 years after the initial proposed 
> solution, I'll happily take that if the end result is a better fix :)

Mea culpa. I am sorry about that. I do respect your time and the work you've 
invested into this issue. I'm guilty of procrastination here. Plus we're in 
murky waters, and I'm worried about getting it wrong -- again. A lot of 
misleading information has been published about filesystem junctions. And 
Microsoft's documentation on the subject of reparse points and the expected 
behavior of symlinks, junctions, and other "name surrogates" is too brief, 
incomplete or just wrong.

> this fix seems quite a bit more involved to implement than the 
> previous one.

I was painting a broad picture of what it might look if we had general support 
for "name surrogate" reparse points. I went out of my way to use conditional 
tense, with "could" and "would" used extensively. I wanted to get that part 
covered upfront to be able to present a generic alternative solution to the 
rmtree() problem before digressing into the subject of junction and symlink 
behavior.

For just this issue, we could use a local solution for Windows. One option 
would be to add an nt._getattrtaginfo function that calls CreateFileW and 
GetFileInformationByHandleEx: FileAttributeTagInfo, with a fallback to 
FindFirstFileW if that fails for a downlevel filesystem. Then wrap the 
os.scandir call with a function that checks for a name-surrogate tag. 

For example:

if not _WINDOWS:
_rmtree_unsafe_scandir = os.scandir
else:
import contextlib

def _rmtree_unsafe_scandir(path):
try:
attr, tag = nt._getattrtaginfo(path)
except OSError:
attr = tag = 0
if (attr & stat.FILE_ATTRIBUTE_DIRECTORY
  and attr & stat.FILE_ATTRIBUTE_REPARSE_POINT
  and tag & 0x2000_): # IsReparseTagNameSurrogate
return contextlib.nullcontext([])
else:
return os.scandir(path)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-08-08 Thread Vidar Fauske


Vidar Fauske  added the comment:

Thanks for the detailed explanation Eryk. While it is a little annoying that it 
comes 2 years after the initial proposed solution, I'll happily take that if 
the end result is a better fix :)

That being said, this fix seems quite a bit more involved to implement than the 
previous one. While I am now slightly more experienced in contributing to 
Python than previously, I'm not sure if I can commit to implementing the 
outlined changes without some help. Some things that would help out a lot:
- An extracted checklist from the rather longer comment?
- Writing up the requirements as tests cases so that I can do test-driven 
development (I'm not very familiar with the testing suite).

If somebody else would want to either do it themselves, or help out with 
certain parts, I'd be very open to that (I'm not trying to "own" this).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-08-06 Thread Eryk Sun


Eryk Sun  added the comment:

> Junctions are sometimes used as links (e.g. mklink /j) and sometimes 
> as volume mount points (e.g. mountvol.exe).

That people sometimes use junctions as if they're symlinks doesn't mean that we 
should pretend it's true. The reparse tag is IO_REPARSE_TAG_MOUNT_POINT, and 
they behave like mountpoints (volume mounts and bind mounts). See Junctions vs. 
Symlinks, below.

It's potentially problematic to conflate junctions with symlinks. For example, 
a user who opts to use a junction instead of symlink may be denied the symlink 
privilege, so code that copies a junction as if it's a symlink will fail (e.g. 
move if os.rename fails, or copyfile with follow_symlinks=False, or copytree 
with symlinks=True), unless we add magical fallback code in os.symlink() to 
create junctions when the link target is a directory. Even if creating a 
symlink succeeds, a symlink has different behavior from that of a junction, 
which could lead to problems later on.

That said, always traversing directory mountpoints as if they're just plain 
directories, like what Unix does, is not the norm in Windows. In some contexts, 
they're basically handled as symlinks -- in particular for a recursive delete. 
CMD's `rmdir /s`, and PowerShell's `remove-item -recurse -force`, and 
Explorer's folder deletion all remove junctions without traversing them, 
regardless of whether the target is a regular DOS path or a volume GUID name. 
For example, if we step through the disassembled code of `rmdir /s` in CMD 
(i.e. cmd!RmDirSlashS), we observe that it looks for the name-surrogate bit in 
the reparse tag to determine whether it should call RemoveDirectoryW on a 
reparse-point directory instead of traversing it.

I would prefer to copy this behavior. It's safer, since standard users can 
create junctions to DOS paths and volume GUID names in Windows, unlike POSIX in 
which only the super user has the power to create mountpoints. While Windows 
mountvol.exe requires administrator access in order to update the mountpoint 
manager, CMD's `mklink /j` doesn't require elevated access, and neither does 
PowerShell's `new-item -itemtype junction`, even if the target is a volume GUID 
name.

Maybe for Windows we can have a name-surrogate category based on the reparse 
tag's name-surrogate bit (i.e. bit 29, "the file or directory represents 
another named entity in the system"), as identified by the WINAPI macro 
IsReparseTagNameSurrogate (winnt.h). The surrogate type would be a superset of 
the symlink type and would be allowed to be a directory. Nothing would change 
with regard to symlinks proper, however. It would remain the case that only 
IO_REPARSE_TAG_SYMLINK reparse points would be classified as symlinks by 
stat(), islink(), readlink(), etc. In POSIX systems, the only surrogate file 
type would be the symlink type, which is never a directory.

A keyword-only option surrogates_as_links=False could be added to stat() and 
lstat(). In POSIX, surrogates_as_links would be ignored. Given both 
follow_symlinks=False and surrogates_as_links=True, stat() would be able to 
return the reparse tag for any name-surrogate reparse point. The tag value 
could be added to _Py_stat_struct as st_reparse_tag, and the stat result tuple 
would be similarly extended. This field would be non-zero when querying any 
name-surrogate reparse point that's not followed. 

os.lstat(path, surrogates_as_links=True) could be the basis for 
os.path.issurrogate(). Or maybe we could add a more targeted function that 
calls CreateFileW and GetFileInformationByHandleEx: FileAttributeTagInfo, or 
FindFirstFileW. The scandir DirEntry result could implement an is_surrogate() 
method based on the reparse tag that's returned by FindFirstFileW.

For _rmtree_unsafe, we could simply insert a test at the start to avoiding 
listing surrogate directories. For example:

if os.path.issurrogate(path):
entries = []
else:
with os.scandir(path) as scandir_it:
entries = list(scandir_it)

We could also add an allow_directory_surrogates=False keyword-only option to 
os.remove, which would be ignored in POSIX just as the symlink() 
target_is_directory option is ignored in POSIX. By default calling os.remove on 
a non-symlink directory would fail, as one expects it should. 

Adding an option to remove a directory via os.remove isn't strictly consistent 
with POSIX, but os.remove was already modified in issue 18314 to always remove 
all junctions, so the behavior is already inconsistent. We'd be clearly 
specifying and documenting how it works, and hopefully the new requirement to 
pass the keyword option wouldn't be too disruptive for programs that have 
relied on the undocumented behavior.

---
Junctions vs. Symlinks

Junctions and symlinks have different constraints and behavior. Junctions can 
only target local devices, and when accessed remotely by a client they're 
evaluated remotely on the server (e.g. if a client accesses a 

[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-08-02 Thread Steve Dower


Steve Dower  added the comment:

Sorry for being slow to review, I just added a few more comments on the PR and 
I think we're nearly done.

--
versions: +Python 3.9 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2019-04-10 Thread Vidar Fauske


Vidar Fauske  added the comment:

I think the submitted PR could need a pair of eyes now. I've sorted the merge 
conflicts, and addressed the previous review points by eryksun.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2018-09-04 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2018-04-13 Thread Vidar Fauske

Vidar Fauske  added the comment:

A PR that fixes the issue according to the feedback from Eryk Sun is available. 
It does seem to have stranded a bit on the review side. That being said, would 
a bugfix for shutil.rmtree be appropriate? It is very annoying when junction 
points made by other tools break pip source install of packages (since pip 
calls shutil.rmtree on its temporary directory after a build).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2018-03-05 Thread Vidar Fauske via Python-bugs-list

Change by Vidar Fauske :


--
keywords: +patch
pull_requests: +5764
stage: test needed -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2017-08-17 Thread Eryk Sun

Changes by Eryk Sun :


--
components:  -IO
stage:  -> test needed
type:  -> behavior
versions: +Python 3.7 -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2017-08-17 Thread Eryk Sun

Eryk Sun added the comment:

Junctions are sometimes used as links (e.g. mklink /j) and sometimes as volume 
mount points (e.g. mountvol.exe). GetVolumePathName can be called to 
distinguish these cases. If a junction is a volume mount point, then its 
absolute path and volume path are the same. This test is already used in 
ntpath.ismount().

For a junction link, islink() should return true, readlink() should work, and 
S_ISDIR() should return false for the lstat() st_mode. For a junction mount 
point, islink() should return false, readlink() should not work, and S_ISDIR() 
should return true for the lstat() st_mode.

A helper function could be added in fileutils.c to determine whether a reparse 
point is a link, based on the file path and reparse tag. Then modify 
_Py_attribute_data_to_stat() to take `BOOL is_link` instead of `ULONG 
reparse_tag`.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31226] shutil.rmtree fails when target has an internal directory junction (Windows)

2017-08-17 Thread Vidar Fauske

New submission from Vidar Fauske:

On Windows (Windows 10 in my case), given the following directory structure:
- rootfolder
 - a
 - b
  - junc (directory junction to ../a)

a call to `shutil.rmtree('root')` will fail with an exception 
`FileNotFoundError: [WinError 3]`, in a call to `os.listdir()` in 
`_rmtree_unsafe`. See attached minimal working example.

Note that sorting order is important: A link in 'a' pointing to 'b' does not 
fail. This is because `os.listdir()` raises an exception for 'b/junc' when its 
target ('a') has already been deleted.

Also, note that this is only for junctions, not directory links (`mklink /J` vs 
`mklink /D`), because:
 - Directory links flag false in the `stat.S_ISDIR(os.lstat('b/junc').st_mode)` 
test while junctions do not.
 - `os.islink()` returns false for both junctions, while directory links do not.

Indicated Python versions are those which I have personally tested on, and 
observed this behavior.

Current use case: Deleting a folder tree generated by an external tool, which 
creates junction links as part of its normal operation ('lerna' tool for the 
'npm' javascript package manager).

--
components: IO, Library (Lib), Windows
files: mwe.py
messages: 300424
nosy: paul.moore, steve.dower, tim.golden, vidartf, zach.ware
priority: normal
severity: normal
status: open
title: shutil.rmtree fails when target has an internal directory junction 
(Windows)
versions: Python 3.3, Python 3.6
Added file: http://bugs.python.org/file47090/mwe.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com