subject:"\[issue37834\] readlink on Windows cannot read app exec links"


Łukasz Langa  added the comment:


New changeset 25a044ee6ce50a9172478cc61d914644778455f6 by Łukasz Langa in 
branch '3.8':
[3.8] bpo-37834: Prevent shutil.rmtree exception (GH-15602) (#15603)
https://github.com/python/cpython/commit/25a044ee6ce50a9172478cc61d914644778455f6


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links


Change by Łukasz Langa :


--
pull_requests: +15280
pull_request: https://github.com/python/cpython/pull/15603

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links


Łukasz Langa  added the comment:


New changeset 7fcc2088a50a4ecb80e5644cd195bee209c9f979 by Łukasz Langa (Ned 
Deily) in branch 'master':
bpo-37834: Prevent shutil.rmtree exception (GH-15602)
https://github.com/python/cpython/commit/7fcc2088a50a4ecb80e5644cd195bee209c9f979


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-29 Thread Ned Deily



Change by Ned Deily :


--
pull_requests: +15279
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/15602

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-29 Thread Steve Dower



Steve Dower  added the comment:

Huh, didn't realise those were always defined in stat.py.

Changing the test to "hasattr(... "st_file_attributes")" should be fine. I can 
get to it in a couple of hours if nobody else gets there first.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-29 Thread Ned Deily



Ned Deily  added the comment:

... and the other important difference is that older versions of macOS do not 
support fd functions so _use_fd_functions is false and the alternate path is 
taken in shutil.rmtree, the path that calls _rm_tree_islink which fails.

>>> shutil.rmtree('a')
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", 
line 718, in rmtree
if _rmtree_islink(path):
  File 
"/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", 
line 564, in _rmtree_islink
(st.st_file_attributes & stat.FILE_ATTRIBUTE_REPARSE_POINT
AttributeError: 'os.stat_result' object has no attribute 'st_file_attributes'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-29 Thread Ned Deily



Ned Deily  added the comment:

One problem seems to be that the code added for this issue assumes that the 
documentation is correct in implying that the stat.FILE_ATTRIBUTE_* constants 
(like stat.FILE_ATTRIBUTE_REPARSE_POINT) are only present on Windows.  But 
besides being conditionally created in _stat.c, they are also undconditionally 
defined in stat.py on all platforms.  That makes some of the tests in 
shutil.py, like:
if hasattr(stat, 'FILE_ATTRIBUTE_REPARSE_POINT'):
to determine which versions of _rmtree_islink and _rmtree_isdir to define 
problematic.

--
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links


Łukasz Langa  added the comment:

Unit tests didn't catch it since it fails on older macOS releases.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links


Łukasz Langa  added the comment:

There's a bug on macOS that is blocking the release regarding 
`stat.FILE_ATTRIBUTE_REPARSE_POINT` being used to check whether 
`os.stat_result` objects have the `st_file_attributes` attribute.

--
nosy: +lukasz.langa
resolution: fixed -> 
stage: resolved -> needs patch
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

That should be all the buildbot issues fixes, so I'm marking this resolved and 
will wait for the inevitable 3.8.0b4 feedback!

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-21 Thread miss-islington



miss-islington  added the comment:


New changeset 967d625a6df27fb490f035045ec8fe4675001d63 by Miss Islington (bot) 
in branch '3.8':
bpo-37834: Fix test on Windows 7 (GH-15377)
https://github.com/python/cpython/commit/967d625a6df27fb490f035045ec8fe4675001d63


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-21 Thread miss-islington



Change by miss-islington :


--
pull_requests: +15089
pull_request: https://github.com/python/cpython/pull/15378

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:


New changeset 374be59b8e479afa8c7a8ae6e77e98915e2f6d45 by Steve Dower in branch 
'master':
bpo-37834: Fix test on Windows 7 (GH-15377)
https://github.com/python/cpython/commit/374be59b8e479afa8c7a8ae6e77e98915e2f6d45


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

Adding a small fix for the Win7 buildbots in PR 15377

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Change by Steve Dower :


--
pull_requests: +15088
pull_request: https://github.com/python/cpython/pull/15377

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:


New changeset 9eb3d5463976068900e94b860ced7e035885835c by Steve Dower in branch 
'3.8':
bpo-37834: Normalise handling of reparse points on Windows (GH-15370)
https://github.com/python/cpython/commit/9eb3d5463976068900e94b860ced7e035885835c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Change by Steve Dower :


--
pull_requests: +15081
stage: commit review -> patch review
pull_request: https://github.com/python/cpython/pull/15370

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Change by Steve Dower :


--
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:


New changeset df2d4a6f3d5da2839c4fc11d31511c8e028daf2c by Steve Dower in branch 
'master':
bpo-37834: Normalise handling of reparse points on Windows (GH-15231)
https://github.com/python/cpython/commit/df2d4a6f3d5da2839c4fc11d31511c8e028daf2c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

So my colleagues confirmed that they deliberately represent junction points as 
symlinks within WSL, including translating the target to the mounted location 
(assuming it is mounted) and letting the Linux code traverse it normally. They 
also said they haven't heard any feedback suggesting it causes any trouble.

However, the more I've thought about the implications of islink() returning 
true for a junction, the more I've come around to Eryk's point of view, so I'm 
going to merge this as is (the PR currently only sets S_IFLNK for actual 
symlinks).

That said, nt.readlink() will still be able to read the target of a junction, 
and code like I have in PR 15287 that uses readlink() to follow links will 
resolve them (noting that that implementation of realpath() attempts to let the 
OS follow it first). I think that covers the intended use cases best.

The only updates left are a couple of docs here, but I'll finish up issue9949 
first and rebase on those changes as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-20 Thread Steve Dower



Steve Dower  added the comment:

The latest PR also fixes issue1311 and issue20541 properly 
(os.path.exists("NUL") now returns True).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-19 Thread Steve Dower



Steve Dower  added the comment:

> So the order of the GetFileInformationByHandleEx and GetFileType blocks 
> actually needs to be flipped.

I've done that now.

And thanks for confirming. That was my suspicion, but I wasn't sure if you knew 
something here that I didn't (v. likely!).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-19 Thread Eryk Sun



Eryk Sun  added the comment:

> Any particular reason you did GetFileAttributesW(path) in the 
> non-FILE_TYPE_DISK case when we have the hFile around?

The point of getting the file attributes is to identify the root directory of 
the namedpipe and mailslot file systems. For example, os.listdir('//./pipe') 
works because we append "\\*.*" to the path.

GetFileInformationByHandle[Ex] forbids a pipe handle, for reasons that may no 
longer be relevant in Windows 10 (?). I remembered the restriction in the above 
case, but it seems I forgot about it when querying the tag. So the order of the 
GetFileInformationByHandleEx and GetFileType blocks actually needs to be 
flipped. That would be a net improvement anyway since there's no point in 
querying a reparse tag from a device that's not a file system (namedpipe and 
mailslot are 'file systems', but only at the most basic level).

I can't imagine there being a problem with querying FileBasicInfo to get the 
file attributes. I just checked that it works fine with "//./pipe/" and 
"//./mailslot/" -- at least in Windows 10. Anyway, GetFileAttributesW uses a 
query-only open that doesn't create a real file object or even require an IRP 
usually, so it's not adding much cost compared to querying FileBasicInfo using 
the handle.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-19 Thread Steve Dower



Steve Dower  added the comment:

Thanks for the code snippet, that helped me a lot (and since you went to the 
trouble of fixing other bugs, I guess I'll have to merge it into my PR now).

Any particular reason you did GetFileAttributesW(path) in the 
non-FILE_TYPE_DISK case when we have the hFile around?

I'm trying to get one more opinion from a colleague on setting S_IFLNK for all 
name surrogate reparse points vs. only symlinks by default (the 
Python/fileutils.c change, and implicitly the fixes to Lib/shutil.py). I might 
try and get some broader opinions as well on whether "is_dir() is true, do you 
suspect it could be a junction" vs "is_link() is true, do you suspect it could 
be a junction", since that is what it really comes down to. (We need to make 
changes to shutil to match Explorer anyway - rmtree should not recurse, and 
copytree should.)

However, the minimal change is to leave S_IFLNK only for symlinks, so unless I 
get a strong case for treating all name surrogates as links, I'll revert to 
that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-19 Thread Eryk Sun



Eryk Sun  added the comment:

Here are two additional differences between mount points and symlinks:

(1) A mount point in a remote path is always evaluated on the server and 
restricted to devices that are local to the server. So if we handle a mount 
point as if it's a POSIX symlink that works with readlink(), then what are we 
to do with the server's drive "Z:"? Genuine symlinks are evaluated on the 
client, so readlink() always makes sense. (Though if we resolve a symlink 
manually, then we're bypassing the system's R2L symlink policy.)

(2) A mount point has its own security that's checked in addition to the 
security on the target directory when it's reparsed. In contrast, security set 
on a symlink is not checked when the link is reparsed, which is why icacls.exe 
implicitly resolves a symlink when setting and viewing security unless the /L 
option is used.

>  - if it's a directory junction, call os.stat instead and return that > (???)

I wanted lstat in Windows to traverse mount points by default (but I gave up on 
this), as it does in Unix, because a mount point behaves like a hard name 
grafting in a path. This is important for relative symlinks that use ".." 
components to traverse above their parent directory. The result is different 
from a directory symlink that targets the same path.

A counter-argument (in favor of winlinks) is that a mount point is still 
ultimately a name-surrogate reparse point, so, unlike a hard link, its 
existence doesn't prevent the directory from being deleted. It's left in place 
as a dangling link if the target is deleted or the device is removed from the 
system. Trying to follow it fails with ERROR_PATH_NOT_FOUND or 
ERROR_FILE_NOT_FOUND. 

Also, handling a mount point as a directory by default would require an 
additional parameter because in some cases we need to be able to open a 
junction instead of traversing it, such as to implement shutil.rmtree to behave 
like CMD's `rmdir /s`. 

Another place identifying a mount point is required, unfortunately, is in 
realpath(). Ideally we would be able to handle mount points as just 
directories. The problem is that NT allows a mount point to target a symlink, 
something that's not allowed in Unix. Traversing the mount point is effectively 
the same as traversing the symlink. So we have to read the mount-point target, 
and if it's a symlink, we have to read and evaluate it. (Consequently it seems 
that getting the real path for a remote path is an intractable problem when 
mount points are involved. We can only get the final path.)

---

Even without the addition of a new parameter, we may still want to limit the 
definition of 'link' in Windows lstat to name-surrogate reparse points, i.e. 
winlinks. Reparse points that aren't name surrogates don't behave like links. 
They behave like the file itself, and reparsing may automatically replace the 
reparse point with the real file. Some of them are even directories that have 
the directory bit (28) set in the tag value, which means they're allowed to 
contain other files. (Without the directory tag bit, setting a reparse point on 
a non-empty directory should fail.)

The counter-argument to changing lstat to only open winlinks is that changing 
the meaning of 'link' in lstat is too disruptive to existing software that may 
depend on the old behavior, i.e. opening any reparse point. I think the use 
cases for opening non-links are rare enough that it's not beyond the pale to 
change this behavior in 3.8 or 3.9.

> Right, but is that because they deliberately want the junction 
> to be treated like a file? Or because they want it to be treated 
> like the directory is really right there?

For copytree it makes sense to traverse a mount point as a directory. We can't 
reliably copy a mount point. In Unix, even when a volume mount or bind mount 
can be detected, there's no standard way to clone it to a new mount point, and 
even if there were, that would require super-user access. In Windows, we could 
wrap CreateDirectorExW, which can copy a mount point, but it requires 
administrator access to copy a volume mount point (i.e. 
"?\\Volume{...}\\"), for which it calls SetVolumeMountPointW in order to 
update the mount-point manager in the kernel. 

We also have a limited ability to create mount points via 
_winapi.CreateJunction, but it's buggy in corner cases and incomplete. It 
suffices for the reason it was added -- testing the ability to delete a 
junction via os.remove(). 

> os.rmdir() already does special things to behave like a junction 
> rather than the real directory, 

This is similar in spirit to Unix, except Unix refuses to delete a mount point. 
For example, if we have a Unix bind mount to a non-empty directory, rmdir() 
fails with EBUSY. On the other hand, rmdir() on the real directory fails with 
ENOTEMPTY. If Unix handled the mount point as if it's just the mounted 
directory, I'd expect the error to be the same. 

It's not particularly special in

[issue37834] readlink on Windows cannot read app exec links

2019-08-19 Thread Eryk Sun



Eryk Sun  added the comment:

Here's the requested overview for the case where name-surrogate reparse points 
are handled as winlinks (Windows links), but S_IFLNK is reserved for 
IO_REPARSE_TAG_SYMLINK. I took the time this afternoon to write it up in C, 
which hopefully is clearer than my prose.  It handles all CreateFileW failures 
inline, but uses a recursive call to traverse a non-link. No reparse tag values 
are hard coded in win32_xstat_impl.

I extended it to support devices that aren't file systems, such as "con", disk 
volumes, and raw disks. This enhancement was requested on another issue, but it 
may as well get updated in this issue if win32_xstat_impl is getting 
overhauled. Some of these devices are already supported by fstat(). The latter 
can be extended similarly to support volume devices and raw disk devices.

I opted to fail the call in the unhandled-tag case if it opens a link that 
should be traversed. Either it's an unhandled link, which is an unacceptable 
condition (i.e. it should be a sink, not a link), or it's a link or sequence of 
links to an unhandled reparse point. Returning the link reparse-point data is 
not what the caller wants from a successful stat().

---

static int
win32_xstat_impl(const wchar_t *path, struct _Py_stat_struct *result,
 BOOL traverse)
{
HANDLE hFile;
BY_HANDLE_FILE_INFORMATION fileInfo;
FILE_ATTRIBUTE_TAG_INFO tagInfo = { 0 };
DWORD fileType, error;
BOOL isUnhandledTag = FALSE;
int retval = 0;

DWORD access = FILE_READ_ATTRIBUTES;
DWORD flags = FILE_FLAG_BACKUP_SEMANTICS; /* Allow opening directories. */
if (!traverse) {
flags |= FILE_FLAG_OPEN_REPARSE_POINT;
}

hFile = CreateFileW(path, access, 0, NULL, OPEN_EXISTING, flags, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
/* Either the path doesn't exist, or the caller lacks access. */
error = GetLastError();
switch (error) {
case ERROR_ACCESS_DENIED: /* Cannot sync or read attributes. */
case ERROR_SHARING_VIOLATION: /* It's a paging file. */
/* Try reading the parent directory. */
if (!attributes_from_dir(path, , )) {
/* Cannot read the parent directory. */
SetLastError(error);
return -1;
}
if (fileInfo.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
if (traverse ||
!IsReparseTagNameSurrogate(tagInfo.ReparseTag)) {
/* The stat call has to traverse but cannot, so fail. */
SetLastError(error);
return -1;
}
}
break;

case ERROR_INVALID_PARAMETER:
/* \\.\con requires read or write access. */
hFile = CreateFileW(path, access | GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
OPEN_EXISTING, flags, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
SetLastError(error);
return -1;
}
break;

case ERROR_CANT_ACCESS_FILE:
/* bpo37834: open unhandled reparse points if traverse fails. */
if (traverse) {
traverse = FALSE;
isUnhandledTag = TRUE;
hFile = CreateFileW(path, access, 0, NULL, OPEN_EXISTING,
flags | FILE_FLAG_OPEN_REPARSE_POINT, NULL);
}
if (hFile == INVALID_HANDLE_VALUE) {
SetLastError(error);
return -1;
}
break;

default:
return -1;
}
}

if (hFile != INVALID_HANDLE_VALUE) {

/* Query the reparse tag, and traverse a non-link. */
if (!traverse) {
if (!GetFileInformationByHandleEx(hFile, FileAttributeTagInfo,
, sizeof(tagInfo))) {
/* Allow devices that do no support FileAttributeTagInfo. */
error = GetLastError() ;
if (error == ERROR_INVALID_PARAMETER ||
error == ERROR_INVALID_FUNCTION ||
error == ERROR_NOT_SUPPORTED) {
tagInfo.FileAttributes = FILE_ATTRIBUTE_NORMAL;
tagInfo.ReparseTag = 0;
} else {
retval = -1;
goto cleanup;
}
} else if (tagInfo.FileAttributes &
 FILE_ATTRIBUTE_REPARSE_POINT) {
if (IsReparseTagNameSurrogate(tagInfo.ReparseTag)) {
if (isUnhandledTag) {
/* Traversing previously failed for either this link
   or its target. */
SetLastError(ERROR_CANT_ACCESS_FILE);
retval = -1;
goto cleanup;
}
/* Traverse a

[issue37834] readlink on Windows cannot read app exec links

2019-08-16 Thread Steve Dower



Steve Dower  added the comment:

> Why group all reparse points under the banner of 'link'?

Because for the purposes of the list of changes beneath it, there wasn't any 
difference (e.g. "traverses any links supported by the OS" is more meaningful 
to most people, even though both of us would understand it to mean "traverses 
any traversable reparse points supported by the OS"). I'm not redefining them 
all to be the same thing, just establishing the terminology for what 
immediately followed.

> As I've previously suggested (and this is the last time because I'm
> becoming a broken record), lstat() should at least be restricted to
> opening only name-surrogate reparse points that are supposed to be like
> links in that they target another path in the system. Plus it also has
> to open unhandled reparse points.

Apologies for causing the repetition. Let me summarise what I believe you're 
suggesting as an alternate flow (bearing in mind that only os.stat() and 
os.lstat() are affected here):

os.lstat:
* open without following reparse points
* check the reparse tag
 - if it's a genuine symlink, return attributes of the link itself and marked 
ST_IFLNK
 - if it's a directory junction, call os.stat instead and return that (???)
 - if it's any name-surrogate reparse point, return attributes of the link 
itself but not marked ST_IFLNK
 - if it's any other reparse point, call os.stat instead and return that
* otherwise regular handling (using hFile or FindFirstFile, etc.)

os.stat:
* open following reparse points
* if the open fails with ERROR_CANT_ACCESS_FILE, try opening without following 
reparse points:
 - if it's a genuine symlink, ???
 - if it's a directory junction, ???
 - if it's any name-surrogate reparse point, ???
 - if it's any other reparse point, return attributes of the link itself
* otherwise regular handling

If you can fill in the gaps, that will help me understand exactly what you're 
proposing.

>> shutil.copytree(path): Unchanged. (requires a minor fix to
>> continue to recursively copy through junctions (using above test),
>> but not symlinks.)
>
> Everyone else who relies on islink(), readlink(), and symlink() to copy
> symlinks isn't special casing their code to look for junctions or
> anything else we lump under the banner of islink(). They could code
> defensively if readlink() fails for a 'link' that we can't read. But
> that leaves the problem of readlink() succeeding for a junction. That
> can causes problems if the target is passed to os.symlink(), which
> changes the link from a hard name grafting to a soft name grafting.

Right, but is that because they deliberately want the junction to be treated 
like a file? Or because they want it to be treated like the directory is really 
right there?

os.rmdir() already does special things to behave like a junction rather than 
the real directory, and the islink/readlink/symlink process is going to be 
problematic on Windows since most users can't create symlinks. That code simply 
isn't going to be portable. But code that is using stat() and expecting to get 
the real directory ought to work, just as code using lstat() and expecting to 
get the link if it's been linked somehow ought to work.

> Why would we need to read the target of a junction? It's not needed for
> realpath() in Windows. We should only have to resolve symlinks. For
> example:
> 
> ...
>
> IMO, S_IFLNK need not be set for anything other than Unix-like symbolic
> links. We would just need to document that on Windows, lstat opens any
> link-like reparse point that indicates it targets another path on the
> system, plus any reparse point that's not handled, but that islink() is
> only true for actual Unix symlinks that can be created via os.symlink()
> and read via os.readlink().

I think I understand your reasoning here now, sorry for it taking so long.

> This preserves how islink() and readlink() currently work, while still
> leaving the door open to fix misbehavior in particular cases. Code,
> including our own code, that needs to look for the broader Windows
> category of "name surrogate" can examine the reparse tag. For
> convenience we can provide issurrogate() that checks
> lstat(filename).st_reparse_tag & 0x2000_.

I'm not adding new API, even for internal use. This is edge case enough that 
os.lstat() is fine for it.

>> os.unlink(path): unchanged (still removes the junction, not the
>> contents)
> 
> Whatever we're calling a link should be capable of being deleted via
> os.unlink. If we apply S_IFLNK, then it won't have S_IFDIR (at least how
> POSIX code expects it), and unlink should work on it. The current state
> of affairs in which unlink/remove works on a junction, which is reported
> by stat() as a directory, is inconsistent. It's not specified to remove
> directories, so nothing that it can remove should be a directory.

I'm proposing to fix the inconsistency by fixing the flags. Your proposal is to 
fix the inconsistency by generating a new error

[issue37834] readlink on Windows cannot read app exec links

2019-08-16 Thread Eryk Sun



Eryk Sun  added the comment:

> the '/mnt/c/Document and Settings' junction... though now I try 
> it that those don't actually work...)

The security on compatibility junctions denies everyone read-data (list) 
access, but in Windows they can still be traversed (e.g. "C:/Documents and 
Settings/Public") because execute (traverse) access isn't denied, and even if 
it were denied, standard Windows users have SeChangeNotifyPrivilege to bypass 
traverse checking. 

I created a test junction named "eggs" that targets a directory named "spam" 
that has "spam/foo" subdirectory. I set the junction's security to match that 
of "Documents and Settings". In WSL, trying to traverse it to stat the "foo" 
subdirectory failed with EACCES, just as with "Documents and Settings/Public". 
After removing the entry that denies read-data access, it worked fine. There's 
no problem traversing "spam" directly if I set the same security on it that 
denies everyone read-data access; it only prevents listing the directory. 

It seems that in order to evaluate an NT junction, drvfs tries to open it with 
read-data access. I don't see why it would have to do that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-16 Thread Eryk Sun

Eryk Sun added the comment:

> Where "links" are the generic term for the set that includes
> "reparse point", "symlink", "mount point", "junction", etc.)

Why group all reparse points under the banner of 'link'? If we have a typical
HSM reparse point (the vast majority of allocated tags), then all operations,
such as delete and rename, act on the file itself, not simply the reparse
point. We should be able to delete or rename a link without affecting the
target.

In this case, there's also no chance that the reparse point is a surrogate for
another path on the system, so code that walks paths doesn't have to worry
about loops with regard to these reparse points. The only practical use case I
can think of for detecting/opening this type of reparse point is backup
software that should avoid triggering an HSM recall. For example:

https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.0/client/r_opt_hsmreparsetag.html

As I've previously suggested (and this is the last time because I'm becoming a
broken record), lstat() should at least be restricted to opening only
name-surrogate reparse points that are supposed to be like links in that they
target another path in the system. Plus it also has to open unhandled reparse
points.

Personally, I'm only comfortable with opening it up to name surrogates if
islink() and readlink() are still limited to just Unix-like symlinks that we
can create via symlink(). Nothing changes there. It's just a restriction of how
lstat() currently works. The addition of the reparse tag in the stat result
enables special handling of non-symlink surrogates.

> shutil.copytree(path): Unchanged. (requires a minor fix to
> continue to recursively copy through junctions (using above test),
> but not symlinks.)

Everyone else who relies on islink(), readlink(), and symlink() to copy
symlinks isn't special casing their code to look for junctions or anything else
we lump under the banner of islink(). They could code defensively if readlink()
fails for a 'link' that we can't read. But that leaves the problem of
readlink() succeeding for a junction. That can causes problems if the target is
passed to os.symlink(), which changes the link from a hard name grafting to a
soft name grafting.

Why would we need to read the target of a junction? It's not needed for
realpath() in Windows. We should only have to resolve symlinks. For example:

C:/Mount/junction/spam/eggs

junction -> Z:/bar/baz

We don't have to resolve this as "Z:/bar/baz/spam/eggs", and doing so may even
be wrong for someone using it to manually resolve a relative symlink.
"C:/Mount/junction/spam/eggs" is a solid path. In Unix it would not be resolved
by realpath(). A solid path is needed to figure out how to create a relative
symlink, or how to manually resolve one for a given path.

For example, if "foo_link" in "C:/Mount/junction/spam/eggs" targets
"../../../foo", this refers to "C:/Mount/foo". On the other hand, if the
junction mount point were replaced by a soft symlink, then
"C:/Mount/symlink/spam/eggs" is not a solid path. "foo_link" is instead
evaluated over the target path: "Z:/bar/baz/spam/eggs/foo_link", so the link
resolves to "Z:/bar/foo".

IMO, S_IFLNK need not be set for anything other than Unix-like symbolic links.
We would just need to document that on Windows, lstat opens any link-like
reparse point that indicates it targets another path on the system, plus any
reparse point that's not handled, but that islink() is only true for actual
Unix symlinks that can be created via os.symlink() and read via os.readlink().

This preserves how islink() and readlink() currently work, while still leaving
the door open to fix misbehavior in particular cases. Code, including our own
code, that needs to look for the broader Windows category of "name surrogate"
can examine the reparse tag. For convenience we can provide issurrogate() that
checks lstat(filename).st_reparse_tag & 0x2000_. This can be true for
directories. Also, a surrogate doesn't have to behave like a Unix "soft"
symlink, i.e. it applies to "hard" mount points. In Unix, issurrogate() could
just be an alias for islink() since Unix provides only one type of name
surrogate.

Currently the name surrogate category includes the following tags:

Microsoft name surrogate (bits 31 and 29)

IO_REPARSE_TAG_MOUNT_POINT 0xA003
IO_REPARSE_TAG_SYMLINK 0xA00C
IO_REPARSE_TAG_IIS_CACHE0xA010
IO_REPARSE_TAG_GLOBAL_REPARSE 0xA019
IO_REPARSE_TAG_LX_SYMLINK 0xA01D
IO_REPARSE_TAG_WCI_TOMBSTONE0xA01F
IO_REPARSE_TAG_PROJFS_TOMBSTONE 0xA022

Non-Microsoft name surrogate (bit 29)

IO_REPARSE_TAG_SOLUTIONSOFT 0x200D
IO_REPARSE_TAG_OSR_SAMPLE 0x2017
IO_REPARSE_TAG_QI_TECH_HSM

[issue37834] readlink on Windows cannot read app exec links

2019-08-16 Thread Steve Dower



Steve Dower  added the comment:

That was a long set of replies, so here's my summary proposed behaviour:

(Where "links" are the generic term for the set that includes "reparse point", 
"symlink", "mount point", "junction", etc.)

os.lstat(path): returns attributes for path without traversing any kind of links

os.stat(path): traverses any links supported by the OS and returns attributes 
for the final file
- on Windows, if the OS reports that a link is not followable (e.g. 
appexeclink), then the original link is reported. So if S_IFLNK is set in the 
result, the caller should, use os.path.realpath() to traverse as many links as 
possible to reach the unfollowable link

os.exists(path): now returns True for links that exist but the OS reports are 
not followable - previously returned False (links that are followable but the 
target does not exist continue to return False)

os.path.is_dir(path): now returns False for directory links where the target 
does not exist

os.unlink(path): unchanged (still removes the junction, not the contents)

os.scandir(path): DirEntry.is_symlink() and DirEntry.is_dir() will now both be 
True for junctions (this was always the case for symlinks to directories). 
DirEntry.stat(follow_symlinks=False).st_reparse_tag == 
stat.IO_REPARSE_TAG_MOUNT_POINT is how to specifically detect a junction.

shutil.copytree(path): Unchanged. (requires a minor fix to continue to 
recursively copy through junctions (using above test), but not symlinks.)

shutil.rmtree(path): Will now remove a junction rather than recursively 
deleting its contents (net improvement, IMHO)


And as I said at the end of the long post, my main goals are:
* the result of _using_ the returned information should be consistent across OS
* there are ways of getting more specific information when needed

The most impactful change would seem to be the one to DirEntry, in that users 
may now treat junction points more like symlinks than real directories 
depending on how they've set up their checks. But for the other benefits - 
particularly the more reliable exists() checks - I think this is worth it 
overall.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-16 Thread Steve Dower



Steve Dower  added the comment:

>> I don't want to add any parameters - I want to have predictable and
>> reasonable default behaviour. os.readlink() already exists for 
>> "open reparse point" behaviour.
>
> I'd appreciate a parameter to always open reparse points, even if a
> filter-driver or the I/O manager handles them. 

But that's only required if lstat() traverses junctions, which I'm not 
proposing to do.

> I'm no longer a big fan of mapping "follow_symlinks" to name surrogates
> ... But it's not up to me. If follow_symlinks means name surrogates, at
> least then lstat can open any reparse point that claims to link to
> another path and thus *should* have link-like behavior (hard link or
> soft link).

Yeah, ultimately it looks like it'll be up to me, which is why I want as much 
of your input as I can get before having to make a call :)

I'm leaning towards "if the OS follows it by default, stat() will follow it", 
and (given some of your later comments), "when ERROR_CANT_ACCESS_FILE occurs on 
a reparse point (is that the only scenario?) behave like lstat()".

(And I carefully phrased that to not imply that Python is going to partially 
follow a link chain for you if the eventual end result is not valid. e.g. 
stat() on a symlink to an appexeclink will return the symlink, because 
ERROR_CANT_ACCESS_FILE was the result. Then the suggestion will be "if stat() 
returns a link, use os.path.realpath() if you want to resolve it as far as 
possible".)

> The questions for me are whether os.readlink() should also read
> junctions and exactly what follow_symlinks means in Windows. We have a
> complicated story to tell if follow_symlinks=False (lstat) opens any
> reparse point or opens just name-surrogate reparse points, and islink()
> is made consistent with this, but then readlink() doesn't work.

I don't think it's that complicated:
* os.lstat() will not traverse any links or reparse points
* os.stat() will traverse any links and reparse points supported by the OS, or 
else return the same as os.lstat()
* os.readlink() can read the target of a symlink or (on Windows) junction point

> If junctions are handled as symlinks, then islink(), readlink(),
> symlink() would be used to copy a junction 'link' while copying a tree
> (e.g. shutil.copytree with symlinks=True). This would transform
> junctions into directory symlinks. In this case, we potentially have a
> problem that relative symlinks in the tree no longer target the same
> files when accessed via a directory symlink instead of a junction. No
> one thinks about this problem on the POSIX side because it would be
> weird to copy a mountpoint as a symlink. In POSIX, a mountpoint is
> always seen as just a directory and always traversed.

This isn't difficult to handle specifically if we want to, though, since the 
stat result now includes the reparse tag. And you're right, we probably have to 
handle it.

>> I'm still not convinced that this is what we want to do. I don't
>> have a true Linux machine handy to try it out (Python 3.6 and 3.7 on
>> WSL behave exactly like the semantics I'm proposing, but that may
>> just be because it's the Windows kernel below it).

> ... This is a decision in WSL's drvfs file-system driver, and I have to
> assume it's intentional.

I assumed it was intentional as well, though it'll almost certainly be based on 
pragmatically getting things to work (e.g. the '/mnt/c/Document and Settings' 
junction... though now I try it that those don't actually work...)

>> ismount() is currently not true for junctions. And I can't find any
>> reference that says that POSIX symlinks can't point to directories,
>
> Our current implementation for junctions is based on GetVolumePathNameW,
> which will be true for junctions that use the "Volume{...}" name to
> mount the file-system root directory. That's a volume mount point.
> 
> I don't know why someone decided that's the sum total of "mount point"
> in Windows. DOS drives and UNC drives can refer to arbitrary file system
> directories. They don't have to refer to file-system root directory. We
> can have "W:" -> "\\??\\C:\\Windows", etc.
> 
> Per the docs, a mount point for ismount() is a "point in a file system
> where a different file system has been mounted". The mounted directory
> doesn't have to be the root directory of the file system. I'd relax this
> definition to include all "hard" name grafting links to other
> directories, even within the same file system. What matter to me is the
> semantics of how this differs from the soft name grafting of a symlink.

I think the intent is that it's mounted the root of another file system. There 
was discussion of just using the reparse tag in issue9035, but it looks like 
from msg138197 that returning True for regular directory links was not the 
intent (despite the tests in that message being insufficient).

> If follow_symlinks=False applies to name surrogates, then a junction
> would be detectable via

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> So for an actual non-root mount point ntpath.ismount() returns True 
> and with IO_REPARSE_TAG_MOUNT_POINT included ntpath.islink() also 
> returns True. nt.readlink() returns the "\\?\Volume{GUID}\" path

If islink() is true, then st_mode has S_IFLNK and not S_IFDIR. So we have a 
mount point that's a symlink, which is not possible in POSIX, and it's not a 
directory, which is unusual in POSIX to say the least. 

For cross-platform consistency, I think it's better if ismount() is a 
directory. The question would be how to ensure that's true in all cases. 
Windows make this hard to accomplish reliably due to DOS 'devices' that get 
reparsed in the object manager to arbitrary paths, not necessarily to volume 
devices.

> Root mount points ("C:\\", etc.) do not return true for islink()

Not really. Here's a mountpoint-symlink chimera:

>>> os.readlink('C:/Mount/Windows')
'C:\\Windows'
>>> os.system('subst W: C:\\Mount\\Windows')
0

It's a symlink and not a directory:

>>> os.path.islink('W:\\')
True
>>> os.lstat('W:\\').st_mode & stat.S_IFDIR
0

But it's also a mount point:

>>> os.path.ismount('W:\\')
True

The object manager reparses "W:" as "\\??\\C:\\Mount\\Windows", and we open it 
with a trailing backlash, which is fine, i.e. "\\??\\C:\\Mount\\Windows\\". 

> I'm not seeing why having both islink() and ismount() be true 
> in this case is a problem.

It's only possible if a mount point is not a directory. That we'd be returning 
this for a junction is a strange state of affairs because a junction must 
target a file system directory. I prefer generalizing junction as a 
name-surrogate type that allows S_IFDIR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> # Always make the OS resolve "unknown" reparse points
>ALLOWED_TO_TRAVERSE = {SYMLINK, MOUNT_POINT}
>if !traverse and st.reparse_tag not in ALLOWED_TO_TRAVERSE:
>return xstat(path, !traverse)

To me the naming here makes sense as ALLOWED_TO_OPEN -- as in if traverse is 
false, meaning we opened the tag, but the tag is not in ALLOWED_TO_OPEN, then 
we have to reopen in order to traverse it. But then, on the second pass after 
ERROR_CANT_ACCESS_FILE, traverse is false, so we would also have to special 
case tags that should never be traversed in NOT_ALLOWED_TO_TRAVERSE = 
{APPEXECLINK, ...}.

I mentioned this in a GitHub comment, and suggested maybe adding another 
internal-only parameter to check to avoid having to ever special case the 
app-exec-link tag. For example, we could have an explicit "open_reparse_point" 
parameter. It would take precedence over traverse (i.e. follow_symlinks). For 
now, it could be internal only.

I assumed stat() would return the reparse point for all tags that fail to 
reparse with ERROR_CANT_ACCESS_FILE since it's not an invalid reparse point, 
just an unhandled one that has no other significance at the file system level. 
To stat(), it can never be anything other than a reparse point. Whatever 
relevance it has depends on some other context (e.g. a CreateProcessW call).

> And the open question is just whether MOUNT_POINT should be in that 
> set near the end. I believe it should, since the alternative is to 
> force all Python developers to write special Windows-only code to
> handle directory junctions.

Python provides no cross-platform tooling to manipulate mount points or read 
what they target (e.g. something like "/dev/sda1" in Unix), so that doesn't 
bother me per se. A mount point is just a directory in the POSIX mindset. 

That doesn't mean, however, that I wouldn't like the ability to detect "name 
surrogate" reparse points in general to implement safer behavior for 
shutil.rmtree and anything else that walks a directory tree. Windows shells 
don't follow name surrogates (including mount points) when deleting a tree, 
such as `rmdir /s`. Unix `rm -rf` does follow mount points. (A process would 
need root access to unmout a directory anyway.) The author's of shutil.rmtree 
have a Unix perspective. For Windows, I'd like to change that perspective. When 
in Rome...

If the only way to get this is to special case mount-point or name-surrogate 
reparse points as applicable to "follow_symlinks", then I suggest that this 
should be clearly documented and that we not go so far as to pretend that 
they're symlinks via S_IFLNK, islink, and readlink. Continue reporting mount 
points as directories (S_IFDIR). Continue with only supporting actual symbolic 
links for the triple: islink(), readlink(), and symlink(). In this case, we can 
copy symlinks and be certain the semantics remain the same since we're not 
changing the type of reparse point.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> It also has a bug that a drive root is a mount point, even if the 
> drive doesn't exist. Also, it's wrong in not checking for junctions 
> in UNC paths. SMB supports opening reparse points over the wire.

"It" in the above sentences is ntpath.ismount, not GetVolumePathNameW.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> Okay, I get it now. So we _do_ want to "upgrade" lstat() to stat() 
> when it's not a symlink.

I don't see that as a behavior upgrade. It's just an implementation detail. 
lstat() is still following its mandate to not follow symlinks -- however you 
ultimately define what a "symlink" is in this context in Windows.
 
> I don't want to add any parameters - I want to have predictable and
> reasonable default behaviour. os.readlink() already exists for 
> "open reparse point" behaviour.

I'd appreciate a parameter to always open reparse points, even if a 
filter-driver or the I/O manager handles them. 

I'm no longer a big fan of mapping "follow_symlinks" to name surrogates (I used 
to like this idea a couple years ago), or splitting hairs regarding 
volume-mount-point junctions and bind-like junctions (used to like this too a 
year ago, because some projects do this, before I thought about the deeper 
concerns). But it's not up to me. If follow_symlinks means name surrogates, at 
least then lstat can open any reparse point that claims to link to another path 
and thus *should* have link-like behavior (hard link or soft link). 

For example, we are able to move, rename, and delete symlinks and junctions 
without affecting the target (except for a junction that's a volume mount 
point, Windows will try DeleteVolumeMountPointW, which can have side effects; 
failure is ignored and the directory deleted anyway). This is implemented by 
the Windows API opening the reparse point and checking for symlink and junction 
tags. It reparses other tags, regardless of whether they're name surrogates, 
but I assume name-surrogate reparse points should be implemented by their 
owning filter drivers to behave in a similar fashion for actions such as rename 
and delete.

While deleting a name-surrogate reparse point should have no effect on the 
target, it still might have unintended consequences. For example, it might 
revive a 'deleted' file in a VFS for Git repo if we delete the tombstone 
reparse point that marks a file that's supposed to be 'deleted'. This might 
happen if code checks os.lstat(filename) and decides to delete the file in a 
non-standard way that ensures only a reparse point is deleted, e.g. 
CreateFileW(filename, ..., FILE_FLAG_DELETE_ON_CLOSE | 
FILE_FLAG_OPEN_REPARSE_POINT, NULL), or manually setting the 
FileDispositionInfo. (DeleteFileW would fail with a file-not-found error 
because it would reparse the tombstone.) Now it's in for a surprise because the 
file exists again in the projected filesystem, even though it was just 
'deleted'. This is in theory. I haven't experimented with projected file 
systems to determine whether they actually allow opening a tombstone reparse 
point when using FILE_FLAG_OPEN_REPARSE_POINT. I assume they do,
  like any other reparse point, unless there's deeper magic involved here.

The questions for me are whether os.readlink() should also read junctions and 
exactly what follow_symlinks means in Windows. We have a complicated story to 
tell if follow_symlinks=False (lstat) opens any reparse point or opens just 
name-surrogate reparse points, and islink() is made consistent with this, but 
then readlink() doesn't work. 

If junctions are handled as symlinks, then islink(), readlink(), symlink() 
would be used to copy a junction 'link' while copying a tree (e.g. 
shutil.copytree with symlinks=True). This would transform junctions into 
directory symlinks. In this case, we potentially have a problem that relative 
symlinks in the tree no longer target the same files when accessed via a 
directory symlink instead of a junction. No one thinks about this problem on 
the POSIX side because it would be weird to copy a mountpoint as a symlink. In 
POSIX, a mountpoint is always seen as just a directory and always traversed.

> I'm still not convinced that this is what we want to do. I don't 
> have a true Linux machine handy to try it out (Python 3.6 and 3.7 on
>  WSL behave exactly like the semantics I'm proposing, but that may 
> just be because it's the Windows kernel below it).

If you're accessing NT junctions under WSL, in that environment they're always 
handled as symlinks. And the result of my "C:/Junction" and "C:/Symlink" 
example --- i.e. "/mnt/c/Junction" and "/mnt/c/Symlink" -- is that *both* 
behave the same way, which is as expected since the WSL environment sees both 
as symlinks, but also fundamentally wrong. In an NT process, they behave 
differently, as a mount point (hard name grafting) and a symlink (soft name 
grafting). This is a decision in WSL's drvfs file-system driver, and I have to 
assume it's intentional. 

In a perfect world, a path on the volume should be consistently evaluated, 
regardless of whether it's accessed from a WSL or NT process. But it's also a 
difficult problem, maybe intractable, if they want to avoid Linux programs 
traversing junctions in dangerous operations -- e.g. `rm -rf`. The only name

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

So for an actual non-root mount point, ntpath.ismount() returns True and with 
IO_REPARSE_TAG_MOUNT_POINT included ntpath.islink() also returns True. 
nt.readlink() returns the "\\?\Volume{GUID}\" path

Root mount points ("C:\\", etc.) do not return true for islink()

os.rename() and os.unlink() work on non-root mount points, but not on root 
mount points. So there is at least some value in being able to detect "this is 
a root mount point that acts like a file".

I'm not seeing why having both islink() and ismount() be true in this case is a 
problem.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> So we _do_ want to "upgrade" lstat() to stat() when it's not a symlink.

Except this bug came about because we want to _downgrade_ stat() to lstat() 
when it's an appexeclink, because the whole point of those is to use them 
without following them (and yeah, most operations are going to fail, but they'd 
fail against the target file too).

So we have this logic:

def xstat(path, traverse):
f = open(path, flags | (0 if traverse else OPEN_REPARSE_POINT))
if !f:
# Special case for appexeclink
if traverse and ERROR_CANT_OPEN_FILE:
st = xstat(path, !traverse)
if st.reparse_tag == APPEXECLINC:
return st
raise ERROR_CANT_OPEN_FILE
# Handle "likely" errors
if ERROR_ACCESS_DENIED or SHARING_VIOLATION:
st = read_from_dir(os.path.split(path))
else:
st = read_from_file(f)

# Always make the OS resolve "unknown" reparse points
ALLOWED_TO_TRAVERSE = {SYMLINK, MOUNT_POINT}
if !traverse and st.reparse_tag not in ALLOWED_TO_TRAVERSE:
return xstat(path, !traverse)

return st

And the open question is just whether MOUNT_POINT should be in that set near 
the end. I believe it should, since the alternative is to force all Python 
developers to write special Windows-only code to handle directory junctions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> For example, if we've opened an HSM reparse point, we must reopen to let
> the file-system filter driver implement its semantics to replace the
> reparse point with the real file from auxiliary storage and complete the
> request. That is the stat() result I want when I say stat(filename,
> follow_symlinks=False) or lstat(filename), because this file is not a
> symlink. It's implicitly just the file to end users -- despite whatever
> backend tricks are being played in the kernel to implement other
> behavior such as HSM. Conflating this with a symlink is not right. Lies
> catch up with us. We can't copy it as link via os.symlink and
> os.readlink, and it doesn't get treated like a symlink in API functions.

Okay, I get it now. So we _do_ want to "upgrade" lstat() to stat() when it's 
not a symlink.

> If you want to add an "open reparse point" parameter ...

I don't want to add any parameters - I want to have predictable and reasonable 
default behaviour. os.readlink() already exists for "open reparse point" 
behaviour.

The discussion is only about what os.lstat() returns when you pass in a path to 
a junction.

> As to mount points, yes, I do think we should always traverse them.
> Please see my extended comment and the follow-up example on GitHub.

I'm still not convinced that this is what we want to do. I don't have a true 
Linux machine handy to try it out (Python 3.6 and 3.7 on WSL behave exactly 
like the semantics I'm proposing, but that may just be because it's the Windows 
kernel below it).

> A mount point is not a link. ismount() and islink() can never both be
> true. Also, a POSIX symlink can never be a directory, which is why we
> make stat() pretend directory symlinks aren't directories. If the user
> wants a link, they can use a symlink that's created by os.symlink,
> mklink, new-item -type SymbolicLink, etc.

ismount() is currently not true for junctions. And I can't find any reference 
that says that POSIX symlinks can't point to directories, nor any evidence that 
we suppress symlink-to-directory creation or resolution in Python (also tested 
on WSL)..

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> Unless your point is that we should _always_ traverse junctions? In 
> which case we have a traverse 'upgrade' scenario (calls to lstat() 
> become calls to stat() when we find out it's a junction).

If we've opened the reparse point to test IO_REPARSE_TAG_SYMLINK, and that's 
not the case, then we need to reopen with reparsing enabled. This is exactly 
what Windows API functions do in order to implement particular behavior for 
just symlinks or just mountpoints. 

For example, if we've opened an HSM reparse point, we must reopen to let the 
file-system filter driver implement its semantics to replace the reparse point 
with the real file from auxiliary storage and complete the request. That is the 
stat() result I want when I say stat(filename, follow_symlinks=False) or 
lstat(filename), because this file is not a symlink. It's implicitly just the 
file to end users -- despite whatever backend tricks are being played in the 
kernel to implement other behavior such as HSM. Conflating this with a symlink 
is not right. Lies catch up with us. We can't copy it as link via os.symlink 
and os.readlink, and it doesn't get treated like a symlink in API functions.  

If you want to add an "open reparse point" parameter, that would make sense. 
It's of some use to get the tag and implement particular behavior for types of 
reparse points, and particularly for name surrogates, which includes mount 
points (junctions).

As to mount points, yes, I do think we should always traverse them. Please see 
my extended comment and the follow-up example on GitHub.

> Again, not sure why we'd want to hide the ability to manipulate the 
> junction itself from Python users, except to emulate POSIX. And I'd 
> imagine anyone using lstat() is doing it deliberately to manipulate 
> the link and would prefer we didn't force them to add Windows-
> specific code that's even more complex.

A mount point is not a link. ismount() and islink() can never both be true. 
Also, a POSIX symlink can never be a directory, which is why we make stat() 
pretend directory symlinks aren't directories. If the user wants a link, they 
can use a symlink that's created by os.symlink, mklink, new-item -type 
SymbolicLink, etc.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

Unless your point is that we should _always_ traverse junctions? In which case 
we have a traverse 'upgrade' scenario (calls to lstat() become calls to stat() 
when we find out it's a junction).

Again, not sure why we'd want to hide the ability to manipulate the junction 
itself from Python users, except to emulate POSIX. And I'd imagine anyone using 
lstat() is doing it deliberately to manipulate the link and would prefer we 
didn't force them to add Windows-specific code that's even more complex.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

[Quoting from the PR comments]

> traverse is from follow_symlinks and only applies to symlinks. It does not 
> apply to other types of reparse points.

I get your argument that junctions are not symlinks, but I disagree that we 
should try this hard to emulate POSIX semantics rather than letting the OS do 
its thing. We aren't reparsing anything ourselves (in the PR) - if the OS is 
configured to do something different because of a reparse point, we're simply 
going to respect that instead of trying to work around it.

A user who has created a directory junction likely wants it to behave as if the 
directory is actually in that location. Similarly, a user who has created a 
directory symlink likely wants it to behave as if it were in that location. 
Powershell treats both the same for high-level operations - the LinkType 
attribute is the only way to tell them apart (mirrored in the st_reparse_tag 
field).

> If we've opened the reparse point to test for a symlink, we must reopen for 
> all other types

The premise here is not true - we've opened the reparse point to get the file 
attributes. The only reason we look at the reparse tag at all is to raise an 
error if the user requested traversal and despite that, we ended up at a link, 
and I'm becoming less convinced that should be an error anyway (this is 
different from nt.readlink() and ntpath.realpath(), of course, where we want to 
read the link and return where it points).

nt.stat() is trying to read the file attributes, and if they are not accessible 
then raising is the correct behaviour, so I don't see why we should try any 
harder than the OS here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> I assume you're talking about realpath() here ...

Yes, and so are you :) Let's move that discussion to issue9949 and/or PR 15287.

> I think os.chdir should raise an exception when passed a device path.

When the OS starts returning an error code for this case, we can start raising 
an exception. It might be worth reporting these cases though, as you're right 
that they don't seem to be handled correctly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Eryk Sun



Eryk Sun  added the comment:

> I wish we could remove the MAX_PATH limit in this case.
>
> The problem is that we have to remove the limit in any case where the 
> resulting path might be used, which is what we're already trying to 
> encourage by supporting long paths.

Maybe it's better to ignore the MAX_PATH limit and let processes fail hard if 
they lack long-path support. A known and expected exception is better than 
unpredictable behavior (see the next paragraph for an example). That leaves the 
problem of a final component that's a reserved name, i.e. a DOS device name or 
a name with trailing dots or spaces. We have no choice but to return this case 
as an extended path. 

The intersection of this problem with SetCurrentDirectoryW (os.chdir) troubles 
me. Without long-path support, the current-directory buffer in the process 
parameters is hard limited to MAX_PATH, and passing SetCurrentDirectoryW an 
extended path can't work around this. Fair enough. But it still accepts a 
device path as the current directory, even though the docs do not explicitly 
allow it, and the implementation assumes it's disallowed. The combination is an 
ugly bug:

>>> os.chdir('//./C:/Temp')
>>> os.getcwd()
'.\\C:\\Temp'
>>> os.path._getfullpathname('/spam/eggs')
'spam\\eggs'

>>> os.chdir('//?/C:/Temp')
>>> os.getcwd()
'?\\C:\\Temp'
>>> os.path._getfullpathname('/spam/eggs')
'spam\\eggs'

In order to resolve a rooted path such as "/spam/eggs", the runtime library 
needs to be able to figure out the current drive from the current directory. It 
checks for a UNC path and otherwise assumes it's a DOS drive, since it's 
assuming device paths aren't allowed. It ends up assuming the current directory 
is a DOS drive and grabs the first two characters as the drive name, which is 
"". Then when joining the rooted path to this 'drive', the initial slash or 
backslash of the rooted path gets collapsed into the preceding backslash. The 
result is at best a broken path, and at worst an unrelated UNC path that 
exists. 

I think os.chdir should raise an exception when passed a device path. In 
explanation, we can point to the documentation of SetCurrentDirectoryW, which 
explicitly states the following:

Each process has a single current directory made up of two parts:

* A disk designator that is either a drive letter followed by 
  a colon, or a server name and share name 
  (\\servername\sharename)
* A directory on the disk designator

> Perhaps the best we can do is an additional test where we 
> GetFinalPathName, strip the prefix, reopen the file, 
> GetFinalPathName again and if they match then return it 
> without the prefix. That should handle the both long path 
> settings as transparently as we can.

I assume you're talking about realpath() here, toward the end where we're 
working with a solid path, or rather where we have at least the beginning part 
of the path as a solid path, up to the first component that's inaccessible.

For the problem of reserved names, GetFullPathNameW is all we need. This 
doesn't address the MAX_PATH issue. But that either works or not. It's a 
user-mode issue. There's nothing to resolve in the kernel. If the path is too 
long, then CreateFileW will fail at 
RtlDosPathNameToRelativeNtPathName_U_WithStatus with STATUS_NAME_TOO_LONG, 
before making a single system call.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Steve Dower



Steve Dower  added the comment:

And I just posted an update to PR 15231 with essentially a rewrite of stat() on 
Windows. Should be better than it was :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Steve Dower



Steve Dower  added the comment:

> Perhaps the best we can do is an additional test where we GetFinalPathName, 
> strip the prefix, reopen the file, GetFinalPathName again and if they match 
> then return it without the prefix. That should handle the both long path 
> settings as transparently as we can.

I tried this and the downside (other than hitting the filesystem a few more 
times) is that any unresolvable path is going to come back with the prefix - 
e.g. symlink cycles and dangling links.

Maybe that's okay? The paths are going to be as valid as they can get (that is, 
unusable :) ) and it at least means that long paths and deliberately 
unnormalized paths.

Posted an update to PR 15287 with that change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Steve Dower



Steve Dower  added the comment:

> Given the only option here is follow_symlinks, then the first CreateFileW 
> call in win32_xstat_impl should only open a reparse point if follow_symlinks 
> is false. In this case, if it happens to open a reparse point that's not a 
> symlink, it should try to reopen with reparsing enabled. In either case, if 
> reparsing fails because a reparse point isn't handled 
> (ERROR_CANT_ACCESS_FILE), try to open the reparse point itself. The latter 
> would be the second open attempt if follow_symlinks is true and the third 
> open attempt if follow_symlinks is false. 
>
> If a reparse point isn't handled, then there's nothing in principle that 
> stat() could ever follow. Given that it's impractical to fail in this case, 
> considering how much code would have to be modified, then the above 
> compromise should suffice.

This sounds like reasonable logic. I'll take a look at updating my PR.

> In the proposed implementation of realpath() that I helped on for issue 14094

I totally forgot about this issue and the PR (but it looks like the contributor 
did too). I just posted PR 15287 today with the tests from the patch on 
issue9949 and it's looking okay - certainly an improvement over the current 
behaviour. But it doesn't have the change to readlink() that would add the \\?\ 
prefix, which means it doesn't answer that question.

> I wish we could remove the MAX_PATH limit in this case.

The problem is that we have to remove the limit in any case where the resulting 
path might be used, which is what we're already trying to encourage by 
supporting long paths.

Perhaps the best we can do is an additional test where we GetFinalPathName, 
strip the prefix, reopen the file, GetFinalPathName again and if they match 
then return it without the prefix. That should handle the both long path 
settings as transparently as we can.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Eryk Sun



Eryk Sun  added the comment:

> but suddenly adding "\\?\" to the paths breaks a lot of assumptions.

The unwritten assumption has been that readlink() is reading symlinks that get 
created by CreateSymbolicLinkW, which sets the print name as the DOS path 
that's passed to the call. In this case, readlink() can rely on the print name 
being the intended DOS path. I raised a concern in the case of reading 
junctions. There's no high-level API to create junctions, so we can't assume 
the print name is reliable. PowerShell's new-item doesn't even set a print name 
for junctions. That symlinks also are valid without a print name (in principle; 
I haven't come across it practice) lends additional weight to always using the 
substitute name.

Even if we have the DOS path, resolving paths manually is still complicated if 
it's a relative symlink with a reserved name (DOS device; trailing dots or 
spaces) as the final component or if it's a long path. Reparsing a relative 
symlink in the kernel doesn't reserve such names and there's no MAX_PATH limit 
in the kernel. So using readlink() is tricky. Fortunately realpath() in Windows 
doesn't require a resolve loop based on readlink(). The kernel almost always 
knows the final path of an opened file, and we can walk the components from the 
end until we find one that exists.

> My idea was to GetFinalPathName(path[4:])[4:] and if that fails

An existing file named "spam" would be a false positive for a link that targets 
"spam.". The internal CreateFileW call would open "spam". Also, symlinks allow 
remote paths, and this doesn't handle "?\\UNC" paths. More generally, a 
link target doesn't have to exist, so being able to access it shouldn't be a 
factor. I see it's also returning the result from _getfinalpathname. readlink() 
doesn't resolve a final, solid path. It just returns the contents of a link, 
which can be a relative or absolute path.

In the proposed implementation of realpath() that I helped on for issue 14094 
(I wasn't aware of the previous work in issue 9949), there's an 
_extended_to_normal function that tries to return a normal path if possible. 
The length of the normal path has to be less than MAX_PATH, and 
_getfullpathname should return the path unchanged. GetFullPathNameW is just 
rule-based processing in user mode; it doesn't touch the file system.

I wish we could remove the MAX_PATH limit in this case. I think at startup we 
should try to call RtlAreLongPathsEnabled, even though it's not documented, and 
set a sys flag to indicate whether we can use long paths. Also, support a -X 
option and an environment variable to override automatic detection.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-14 Thread Eryk Sun



Eryk Sun  added the comment:

> I really want a fix for this in 3.8, or else os.stat(sys.executable) 
> may fail

I agree, but Python can support this without handling junctions as symlinks or 
limiting the reparse points that we can follow. There are reparse points for 
offline storage and projected file systems (e.g. VFS for Git) that should 
normally be traversed, and to that I would add junctions. stat() in Unix always 
opens a mounted directory, not the underlying directory of a mount point. 
Windows Python should try to be consistent with Unix where doing so is 
reasonable. 

Given the only option here is follow_symlinks, then the first CreateFileW call 
in win32_xstat_impl should only open a reparse point if follow_symlinks is 
false. In this case, if it happens to open a reparse point that's not a 
symlink, it should try to reopen with reparsing enabled. In either case, if 
reparsing fails because a reparse point isn't handled (ERROR_CANT_ACCESS_FILE), 
try to open the reparse point itself. The latter would be the second open 
attempt if follow_symlinks is true and the third open attempt if 
follow_symlinks is false. 

If a reparse point isn't handled, then there's nothing in principle that stat() 
could ever follow. Given that it's impractical to fail in this case, 
considering how much code would have to be modified, then the above compromise 
should suffice.

I checked RtlNtStatusToDosError over the range 0xC000_ - 0xC0ED_. (In 
ntstatus.h, FACILITY_MAXIMUM_VALUE is 0xED, so there's no point in checking 
facilities 0x0EF-0xFFF.) Only STATUS_IO_REPARSE_TAG_NOT_HANDLED maps to 
ERROR_CANT_ACCESS_FILE, so we don't have to worry about unrelated failures 
using this error code. This is separate from an invalid reparse point 
(ERROR_INVALID_REPARSE_DATA) or a reparse point that can't be resolved 
(ERROR_CANT_RESOLVE_FILENAME), which should still be errors that fail the call.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> I'm wary of trying to return it without the prefix.

Me too, but suddenly adding "\\?\" to the paths breaks a lot of assumptions.

> We would need a function that's shared with the proposed implementation of 
> realpath() to determine whether the given path (not the final path) is safe 
> to return as a normal DOS or UNC path.

My idea was to GetFinalPathName(path[4:])[4:] and if that fails, don't strip 
the prefix. Which is obviously not be perfect, but since we're not going to add 
a check for the LongPathsEnabled flag (let alone any of the other edge cases), 
we can't easily figure out whether it's safe to return manually.

I really want a fix for this in 3.8, or else os.stat(sys.executable) may fail, 
but I don't think changing the result of readlink() is okay at this stage. 
Maybe I'll leave that out for now and just take the st_reparse_tag and stat() 
changes?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> Until then, I think it makes sense for os.readlink() to handle the 
> prefix and _getfinalpathname() call, but leave nt.readlink() as
> returning the raw value.

os.readlink() shouldn't resolve the final path or realpath(). It should simply 
return the link substitute path, which starts with "?\\". I'm wary of 
trying to return it without the prefix. We would need a function that's shared 
with the proposed implementation of realpath() to determine whether the given 
path (not the final path) is safe to return as a normal DOS or UNC path.

---

BTW, I looked into how CreateFileW is supporting "\\??\\". It turns out at some 
point they changed RtlpDosPathNameToRelativeNtPathName to look for both 
"?\\" and "\\??\\" when determining whether to skip normalizing the path. I 
think that's a bad idea since the Windows API doesn't consistently support the 
NT prefix.

ReactOS is supposed to target NT 5.2 (Server 2003), and it doesn't allow the NT 
prefix here. Refer to its implementation of 
RtlpDosPathNameToRelativeNtPathName_Ustr [1]. It only looks for 
RtlpWin32NtRootSlash ("?\\"), not RtlpDosDevicesPrefix ("\\??\\"). So I 
suppose Windows changed this some time between Vista and Windows 10.

[1] 
https://github.com/reactos/reactos/blob/d93e516747e3220ba182f77824e8b1a8b548edae/sdk/lib/rtl/path.c#L1034

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

I think we'll want issue9949 merged as well, so that ntpath.realpath() does its 
job. Certainly the tests would benefit from it.

Until then, I think it makes sense for os.readlink() to handle the prefix and 
_getfinalpathname() call, but leave nt.readlink() as returning the raw value.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> Replacing "\??\" with "\\?\" in place is trivial though, as we start 
> with a mutable buffer. I'm just not clear that it's as simple as that, 
> though.

If the path starts with "\\??\\" we can just change the first question mark to 
a backslash. For a symlink, fail at this step if Flags includes 
SYMLINK_FLAG_RELATIVE, which would be inconsistent data. 

"\\??\\" is the NT prefix for the caller's local (logon session) device 
directory. This implicitly shadows the global device directory, "\\Global??". 
We don't use NT's real "\\Device" directory in Windows, but it's available as 
"//?/GlobalRoot/Device" or "//?/Global/GlobalRoot/Device". 

"\\??\\" shouldn't be used directly in Windows programming, since 
GetFullPathNameW (often implicitly called) doesn't recognize it, but some API 
functions will pass it along, even though they should really fail the call. It 
will work until it doesn't, and by then we could have a right mess.

If a path starts with exactly "?\\" (backslash only), Windows simply copies 
it verbatim, except for changing the prefix to "\\??\\". Other Windows device 
prefixes where the domain is "." or "?" can use any mix of forward slash and 
backslash because they get normalized first (e.g. "//?\\" -> "?\\"). Only 
an initial "?\\" bypasses the normalization step.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> I feel like that's more work than is worth us doing for something that 
> will be relatively rarely used, will live in the stdlib, and is 
> obviously something that will become outdated as Microsoft adds new 
> reparse points.

Junctions (NT 5) and symlinks (NT 6) are stable. So if os.read_reparse_point 
only returns the unparsed bytes, maybe add os.read_junction as well. I know 
other projects overload this on POSIX readlink. They're both name-surrogate 
reparse points, but they have different constraints and behavior.

The I/O manager tries to make a junction behave something like a hard link to a 
directory, with the addition of being able to link across local volumes. This 
is in turn relates to how it evaluates relative symbolic links. For example, if 
"C:/Junction" and "C:/Symlink" both target r"\\?\C:\Temp1\Temp2", and there's a 
relative symlink "C:/Temp1/Temp2/foo_link" that targets r"..\foo", then 
"C:/Junction/foo_link" references "C:/foo" but "C:/Symlink/foo_link" references 
"C:/Temp1/foo".
 
Another difference is with remote filesystems. SMB special cases symlinks to 
have the server send the reparse request over the wire to be evaluated on the 
client side. (Refer to [MS-SMB2] 2.2.2.1 Symbolic Link Error Response, and the 
subsequent section about client-side handling of this error.) So an absolute 
symlink on the server that targets r"\\?\C:\Windows" actually references the 
client's "C:/Windows" directory, whereas the same junction target would 
reference the server's "C:/Windows" directory. The symlink evaluation will 
succeed only if the client's R2L (remote to local) policy allows it. Symlinks 
can also target remote devices, depending on the L2R and R2R policy settings. 
Junctions are restricted to local devices.

> In theory, we can't follow any reparse point that isn't documented as 
> being followable and provides the target name is a stable, documented
> manner. 

To follow a reparse point, we're just calling CreateFileW the normal way, 
without FILE_FLAG_OPEN_REPARSE_POINT. The Windows API also does this (usually 
via NtOpenFile, but this has a similar  FILE_OPEN_REPARSE_POINT option) for 
tags it doesn't handle. That's why MoveFileExW (os.rename and os.replace) fails 
on one of these app-exec links. In some cases, it adds a third open attempt if 
the reparse point isn't handled. This is important for DeleteFileW (os.remove) 
and RemoveDirectoryW (os.rmdir) because we should be able to delete a bad 
reparse point.

> The appexec links don't do this (I just looked at the returned 
> buffer), so we really should just not follow them. They exist solely 
> so that CreateProcess internally returns a unique error code that can 
> be handled without impacting regular process start, which means we 
> *don't* want to follow them.

I know, so a regular stat() will fail. I think for an honest result, stat() 
should fail for a reparse point that can't be handled. Scripts can use 
stat(path, follow_nonlinks=False) or stat(path, follow_reparse_points=False), 
or however this eventually gets parameterized to force opening all reparse 
points.

> Now, directory junctions are far more interesting. My gut feel is that 
> we should treat them the same as symlinks (with respect to stat vs. 
> lstat) for consistency

Junctions are their own thing. They're mount points that behave like Unix 
volume mounts (in Windows, target the root directory of a volume device named 
by its "Volume{...}" name) or Unix bind mounts (in Windows, target arbitrary 
directories on any local volume; in Linux it's a mount created with --bind or 
FUSE bindfs). Bind-like junctions are also similar to DOS subst drives (e.g. 
"W:" -> "C:/Windows") and UNC shares. These are all mount points of one sort or 
another. 

OTOH, the base device names such as "//?/C:" and "//?/Volume{...}", without a 
specified root directory, are aliases (object symlinks) for an NT device such 
as r"\Device\HarddiskVolume2". These paths open the volume itself, not the 
mounted filesystem, so they're not like Unix mount points. They're like Unix 
'/dev/sda1' device paths, except in Unix, devices don't have their own 
namespaces, so it would be nonsense to open "/dev/sda1/".

RemoveDirectoryW for a volume mount is special cased to call 
DeleteVolumeMountPointW, which notifies the mount-point manager. It won't do 
this for a junction that targets the same volume root directory via the DOS 
drive-letter name -- or any other device alias for that matter (e.g. Windows 10 
creates "\\?\BootPartition" as an alternative named for the system "C:" drive). 
So bind-like mounts are different from volume mounts, but both are different 
from symlinks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe:

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

Latest PR update uses GetFinalPathName to resolve SubstituteName, returning it 
unmodified on failure (e.g. symlink where the target file no longer exists).

Replacing "\??\" with "\\?\" in place is trivial though, as we start with a 
mutable buffer. I'm just not clear that it's as simple as that, though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> If we support reading junctions, this should be using the substitute name 
> (with \??\ replaced by \\?\) instead of the print name.

GetFinalPathName() does this conversion for us, any reason not to use that? 
(GetFullPathName() doesn't seem to recognize the \??\ prefix and will prepend 
the current drive)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

> I looked into this spawn problem. It's due to Cygwin's spawnve, which calls 
> NtOpenFile to open the file, and then memory-maps it and reads the image 
> header [1].

Great, that's roughly what I suspected. Unfortunately, I've been told that 
looking into Cygwin's (GPL'd) code myself is going to cost me two weeks of dev 
work ("cooling off" period), so someone else will have to help them fix it.

> It could parse out as much as possible and return a struct sequence

TBH, I feel like that's more work than is worth us doing for something that 
will be relatively rarely used, will live in the stdlib, and is obviously 
something that will become outdated as Microsoft adds new reparse points.

If we return the parsed data as opaque bytes then someone can write a simple 
PyPI package to extract the contents. (Presumably the reparse tag will have 
come from stat() already.)

> Maybe have two non-overlapping options, follow_symlinks and follow_nonlinks

I read this suggestion the first time and I think it would send the message 
that we are more capable than we really are :)

In theory, we can't follow any reparse point that isn't documented as being 
followable and provides the target name is a stable, documented manner. The 
appexec links don't do this (I just looked at the returned buffer), so we 
really should just not follow them. They exist solely so that CreateProcess 
internally returns a unique error code that can be handled without impacting 
regular process start, which means we *don't* want to follow them.

Again, if someone writes the PyPI package to parse all known reparse points, 
they can do that.

Now, directory junctions are far more interesting. My gut feel is that we 
should treat them the same as symlinks (with respect to stat vs. lstat) for 
consistency, but I'll gladly defer to you (Eryk) for the edge cases that may 
introduce.

---

My plan for PR 15231 is:
* make os.stat only follow symlinks and junctions
* make os.readlink only read symlinks and junctions
* add st_reparse_tag field to os.stat

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Eryk Sun  added the comment:

> Honestly, the only real problem I've seen is in applications that use
> a reimplementation of spawn() rather than the UCRT's version (which
> handles the reparse point jsut fine).

I looked into this spawn problem. It's due to Cygwin's spawnve, which calls 
NtOpenFile to open the file, and then memory-maps it and reads the image header 
[1]. For an app-exec link, this fails with STATUS_IO_REPARSE_TAG_NOT_HANDLED, 
which is translated to Windows ERROR_CANT_ACCESS_FILE (1920). In turn, Cygwin 
translates this error to its to default errno value, EACCES. (The Windows CRT 
uses EINVAL as the default.)

[1] 
https://github.com/cygwin/cygwin/blob/aa55d22cb55d67d7f77ee9d58f9016c42c3ee495/winsup/cygwin/spawn.cc#L1035

> Maybe it can just return bytes and let the caller do the parsing?

It could parse out as much as possible and return a struct sequence:

ReparseTag (int)
ReparseGuid (string, or None for Microsoft tags)
SymlinkFlags (int, SYMLINK_FLAG_RELATIVE) 
PrintName (string or None)
SubstituteName (string or None)
DataBuffer (bytes)

Only the ReparseTag and DataBuffer fields would always be present. ReparseGuid 
would be present for non-Microsoft tags (i.e. bit 31 is unset). The result 
could maybe have one or more fields for the app-exec-link type.

> This sounds like a good option to me, too. So that would suggest that 
> Modules/posixmodule.c in win32_xstat_impl would verify both 
> FILE_ATTRIBUTE_REPARSE_POINT and IO_REPARSE_TAG_SYMLINK? And if the 
> tag is different it'll return information about the reparse point 
> rather than the target?

Maybe have two non-overlapping options, follow_symlinks and follow_nonlinks. 
The latter might be more sensible as a single option for lstat(), since it 
never follows symlinks. Then if either follow_symlinks or follow_nonlinks is 
false, stat has to open the reparse point and get the tag to determine whether 
it should reopen with reparsing enabled.

The straight-forward way to get the tag is GetFileInformationByHandleEx: 
FileAttributeTagInfo. This should succeed if the file system supports reparse 
points. But still check for FILE_ATTRIBUTE_REPARSE_POINT in the attributes 
before trusting the tag value. If the call fails, assume it's not a reparse 
point and proceed to GetFileInformationByHandleW.

Another tag to look for is IO_REPARSE_TAG_AF_UNIX (0x8000_0023). It's 
relatively new, so I haven't used it yet. I suppose it should be mapped 
S_IFSOCK in the stat result.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

Thanks Eryk for your valuable response :)

> readlink() and is_link() should be reserved for POSIX symlinks, i.e. only 
> IO_REPARSE_TAG_SYMLINK. 

I'm okay with that reasoning. Honestly, the only real problem I've seen is in 
applications that use a reimplementation of spawn() rather than the UCRT's 
version (which handles the reparse point jsut fine).

> they lack an associated filter driver that handles them in normal file 
> operations

Also true, and likely to be a cause of more problems. But not ours to fix :)

> How about adding a separate function such as nt.read_reparse_point() that's 
> able to read reparse points [...]? The internal implementation could be 
> shared with os.readlink.

Maybe it can just return bytes and let the caller do the parsing?

> The current behavior of follow_symlinks in stat() is problematic. It should 
> be limited to symlinks.

This sounds like a good option to me, too. So that would suggest that 
Modules/posixmodule.c in win32_xstat_impl would verify both 
FILE_ATTRIBUTE_REPARSE_POINT and IO_REPARSE_TAG_SYMLINK? And if the tag is 
different it'll return information about the reparse point rather than the 
target?

> The stat result would gain an st_reparse_tag field (already added by your 
> PR), which would be non-zero whenever a reparse point is opened.

I agree this can stay. We don't need the rest of the logic here - callers who 
care to follow non-symlink reparse points can use the new 
nt.read_reparse_point() function to do it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-12 Thread Eryk Sun



Eryk Sun  added the comment:

> How about adding a separate function such as nt.read_reparse_point()

If we support reading junctions, this should be using the substitute name (with 
\??\ replaced by \\?\) instead of the print name. For both symlinks and 
junctions, [MS-FSCC] 2.1.2 states that the print name SHOULD be informative, 
not that it MUST be, where SHOULD and MUST are defined by RFC2119. The print 
name can be non-informative for some reason (e.g. maybe to use the whole 16 KiB 
buffer for the target path.) For symlinks that's not normally observed, since 
CreateSymbolicLinkW is conservative. For junctions, applications and frameworks 
use a low-level IOCTL to set them, and they're not necessarily bothering to set 
a print name. For example, PowerShell omits it:

PS C:\Mount> new-item Spam -type junction -value C:\Temp\Spam | out-null
PS C:\Mount> fsutil.exe reparsepoint query spam
Reparse Tag Value : 0xa003
Tag value: Microsoft
Tag value: Name Surrogate
Tag value: Mount Point
Substitue Name offset: 0
Substitue Name length: 32
Print Name offset: 34
Print Name Length: 0
Substitute Name:   \??\C:\Temp\Spam

This mount point works fine despite the lack of a print name.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links

2019-08-12 Thread Eryk Sun



Eryk Sun  added the comment:

Symlinks are specially interpreted by the file API, I/O manager, and network 
redirector. So I think they should remain a separate category. readlink() and 
is_link() should be reserved for POSIX symlinks, i.e. only 
IO_REPARSE_TAG_SYMLINK. 

These app-exec reparse points are not name surrogates (i.e. 0x2000_ set in 
the tag), and they lack an associated filter driver that handles them in normal 
file operations. I/O syscalls that try to reparse the path fail with 
STATUS_IO_REPARSE_TAG_NOT_HANDLED.

It's also messy to conflate different things in general for the sake of a 
particular case. For example, if code sees the app-exec link as a symlink 
because ntpath.is_link() is true, it may try to copy it via os.readlink() and 
os.symlink(). This creates a different type of reparse point, if the user even 
has the symlink privilege, which may not behave the same as the original 
app-exec link in a CreateProcessW call. (I don't know what the end goal is for 
the Windows team in terms of allowing app executables under 
"%ProgramFiles%\WindowsApps" to be executed directly.) 

How about adding a separate function such as nt.read_reparse_point() that's 
able to read reparse points that are documented (most types are opaque) and can 
be interpreted as a string, or a tuple of strings? The internal implementation 
could be shared with os.readlink.

The current behavior of follow_symlinks in stat() is problematic. It should be 
limited to symlinks. So another idea that I think could prove generally useful 
is to extend stat() and lstat() with a follow_reparse_points=True parameter. If 
false, it would imply follow_symlinks is false. Explicitly passing 
follow_symlinks=True with follow_reparse_points=False would be an error. With 
follow_symlinks=False and follow_reparse_points=True, it would only open a 
symlink and follow all other types of reparse points. The stat result would 
gain an st_reparse_tag field (already added by your PR), which would be 
non-zero whenever a reparse point is opened.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Steve Dower  added the comment:

Added an update with a new stat field (st_reparse_tag) so that we can tell them 
apart, which also means I can add a test for readlink.

--
assignee:  -> steve.dower

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Change by Steve Dower :


--
keywords: +patch
pull_requests: +14955
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15231

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links



Change by Steve Dower :


--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37834] readlink on Windows cannot read app exec links