[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Eryk Sun

Eryk Sun added the comment:

Paths starting with "\\.\" (or  "//./") and "\\?\" are not UNC paths. I've 
provided some explanations and examples below, and I also encourage you to read 
"Naming Files, Paths, and Namespaces":

https://msdn.microsoft.com/en-us/library/aa365247

"\\.\" is the general way to access DOS devices, but with some path processing 
still enabled. For example:

>>> files = os.listdir(r'//./C:/Windows/System32/..')
>>> [x for x in files if x[:2] == 'py']
['py.exe', 'pyw.exe']

Notice that using slash and ".." is allowed. This form doesn't allow relative 
paths that depend on per-drive current directories. It's actually not 
recommended to use "\\.\" to access files on drive letters. Normally it's used 
with drive letters only when directly opening a volume. For example:

>>> fd = os.open(r'\\.\C:', os.O_RDONLY | os.O_BINARY)
>>> os.read(fd, 512)[:7]
b'\xebR\x90NTFS'

The "\\?\" prefix allows the most access to the NT kernel namespace from within 
the Windows API (e.g. file paths can be up to 32K characters instead of the DOS 
limit of 260 characters). It does so by disabling all path processing, which 
means the onus is on the programmer to provide a fully-qualified, Unicode path 
that only uses backslash as the path separator.

So why does "\\.\" exist? Some DOS devices are made implicitly available in the 
Windows API, such as DOS drive letters and "CON". However, the Windows API 
greatly extends the number of 'DOS' devices (e.g. the "PhysicalDrive0" device 
for low-level access to the first disk). Accessing these devices unambiguously 
requires the "\\.\" prefix. A common example is using "\\.\pipe\[pipe name]" to 
open a NamedPipe. You can even list the NamedPipe filesystem in Python. For 
example:

>>> p1, p2 = multiprocessing.Pipe()
>>> [x for x in os.listdir(r'\\.\pipe') if x[:2] == 'py']
['pyc-719-1-hoirbkzb']

Global DOS device names are defined in the kernel's "\Global??" directory. Some 
DOS devices, such as mapped network drives, are usually defined local to a 
logon session in the kernel's "\Sessions\0\DosDevices\[Logon Session ID]" 
directory. In the examples I gave, you may have noticed that each native kernel 
path starts with "\??\". This is a virtual directory in the kernel (and only 
the kernel). It instructs the object manager to first search the local session 
DOS devices and then the global DOS devices.

A DOS device is almost always implemented as an object symbolic link to the 
real NT device name in the kernel's "\Device" directory. For example, 
"\Global??\PIPE" links to "\Device\NamedPipe" and the "C:" drive may be a link 
to "\Device\HarddiskVolume2". This device is what the kernel actually opened in 
the previous example that read from "\\.\C:". Note that this accesses the 
volume itself, not the root directory of the filesystem that resides on the 
volume. The latter is "\\.C:\". The trailing backslash makes all the 
difference. (Opening a directory such as the latter requires backup semantics, 
as described in the CreateFile docs.)

If a DOS drive letter is assigned to a volume, the assignment is stored in the 
registry by the volume's ID. (Dynamic volumes that span multiple disks also 
contain a drive letter hint.) For volume devices, the kernel also creates a 
GUID name that's always available and allows mounting a volume in a directory 
using an NTFS reparse point (e.g. see the output of mountvol.exe). You can also 
use GUID volume names in the Windows API. For example:

>>> path = r'\\?\Volume{1693b540----612e}\Windows'
>>> files = os.listdir(path)
>>> [x for x in files if x[:2] == 'py']
['py.exe', 'pyw.exe']

But normally you'd just mount the volume, which can even be recursively mounted 
within itself. For example:

>>> os.mkdir('C:\\SystemVolume')
>>> subprocess.call(r'mountvol C:\SystemVolume 
\\?\Volume{1693b540----612e}')
0
>>> files = os.listdir(r'C:\SystemVolume\Windows')
>>> [x for x in files if x[:2] == 'py']
['py.exe', 'pyw.exe']

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Dustin Oprea

Dustin Oprea added the comment:

Thank you for your elaborate response. I appreciate knowing that 
"\\server\share" could be considered as the "drive" portion of the path.

I'm having trouble determining if "\\?\" is literally some type of valid UNC 
prefix or you're just using it to represent some format/idea. Just curious.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Eryk Sun

Eryk Sun added the comment:

dirname() is implemented via split(), which begins by calling splitdrive(). The 
'drive' for a UNC path is the r"\\server\share" component. For example:

>>> path = r'\\server\share\folder\file'
>>> os.path.splitdrive(path)
('server\\share', '\\folder\\file')
>>> os.path.split(path)
('server\\share\\folder', 'file')
>>> os.path.dirname(path)
'server\\share\\folder'

If you double the initial slashes, it's no longer a valid UNC path:

>>> path = r'server\\share\\folder\\file'
>>> os.path.splitdrive(path)
('', 'serversharefolderfile')
>>> os.path.split(path)
('serversharefolder', 'file')
>>> os.path.dirname(path)
'serversharefolder'

Windows itself will attempt to handle it as a UNC path, but the path is 
invalid. Specifically, before passing the path to the kernel, Windows collapses 
all of the extra slashes, except an initial slash count greater than two always 
leaves an extra slash in the path. For example:

>>> open(r'server\\share\\folder\\file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC\\server\share\folder\file
OBJ_CASE_INSENSITIVE

Notice the extra backlash in "UNC\\server". Thus a valid UNC path must start 
with exactly two slashes. 

Using forward slash is generally fine. The Windows API substitutes backslash 
for slash before passing a path to the kernel. For example:

>>> open(r'//server/share/folder/file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC\server\share\folder\file
OBJ_CASE_INSENSITIVE

Except you can't use forward slash with a "\\?\" path, which bypasses normal 
path processing. For example:

>>> open(r'\\?\UNC/server/share/folder/file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC/server/share/folder/file
OBJ_CASE_INSENSITIVE

In the kernel '/' isn't a path separator. It's just another name character, so 
this looks for a DOS device named "UNC/server/share/folder/file". Microsoft 
file systems forbid using slash in names (for POSIX compatibility and to avoid 
needless confusion), but you can use slash in the name of kernel objects such 
as Event objects, or even in the name of DOS devices defined via 
DefineDosDevice.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Eryk Sun

Changes by Eryk Sun :


--
Removed message: http://bugs.python.org/msg269406

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Eryk Sun

Eryk Sun added the comment:

dirname() is implemented via split(), which begins by calling splitdrive(). The 
'drive' for a UNC path is the r"\\server\share" component. For example:

>>> path = r'\\server\share\folder\file'
>>> os.path.splitdrive(path)
('server\\share', '\\folder\\file')
>>> os.path.split(path)
('server\\share\\folder', 'file')
>>> os.path.dirname(path)
'server\\share\\folder'

If you double the initial slashes, it's no longer a valid UNC path:

>>> path = r'server\\share\\folder\\file'
>>> os.path.splitdrive(path)
('', 'serversharefolderfile')
>>> os.path.split(path)
('serversharefolder', 'file')
>>> os.path.dirname(path)
'serversharefolder'

Windows itself will attempt to handle it as a UNC path, but the path is 
invalid. Specifically, before passing the path to the kernel, Windows collapses 
all of the extra slashes, except an initial slash count greater than two always 
leaves an extra slash in the path. For example:

>>> open(r'server\\share\\folder\\file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC\\server\share\folder\file
OBJ_CASE_INSENSITIVE

Notice the extra backlash in "UNC\\server". Thus a valid UNC path must start 
with exactly two slashes. 

Using forward slash is generally fine. The Windows API substitutes backslash 
for slash before passing a path to the kernel. For example:

>>> open(r'//server/share/folder/file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC\server\share\folder\file
OBJ_CASE_INSENSITIVE

Except you can't use forward slash with a "\\?\" path, which bypasses normal 
path processing. For example:

>>> open(r'\\?\UNC/server/share/folder/file')
Breakpoint 0 hit
ntdll!NtCreateFile:
7ffb`a1f25b70 4c8bd1  mov r10,rcx
0:000> !obja @r8
Obja +0049781ef160 at 0049781ef160:
Name is \??\UNC/server/share/folder/file
OBJ_CASE_INSENSITIVE

In the kernel '/' isn't a path separator. It's just another name character, so 
this (potentially) looks for a server named "/server/share/folder/file". 
Microsoft file systems forbid using slash in names (for POSIX compatibility and 
to avoid needless confusion), but you can use slash in the name of kernel 
objects such as Event objects, or even in the name of DOS devices defined via 
DefineDosDevice.

--
nosy: +eryksun
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27403] os.path.dirname doesn't handle Windows' URNs correctly

2016-06-27 Thread Dustin Oprea

New submission from Dustin Oprea:

Notice that os.path.dirname() returns whatever it is given if it is given a 
URN, regardless of slash-type. Oddly, you have to double-up the forward-slashes 
(like you're escaping them) in order to get the correct result (if you're using 
forward-slashes). Back-slashes appear to be broken no matter what.

C:\Python35-32>python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.dirname("a\\b")
'a\\b'
>>> os.path.dirname("//a/b")
'//a/b'
>>> os.path.dirname("a//b")
'a'

Any ideas?

--
components: Windows
messages: 269404
nosy: Dustin.Oprea, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.path.dirname doesn't handle Windows' URNs correctly
type: behavior
versions: Python 2.7, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com