[issue46654] urllib.request.urlopen doesn't handle UNC paths produced by pathlib's as_uri() (but can handle UNC paths with additional slashes)

2022-02-06 Thread Eryk Sun


Change by Eryk Sun :


--
assignee: docs@python -> 
components:  -2to3 (2.x to 3.x conversion tool), Argument Clinic, Build, C API, 
Cross-Build, Demos and Tools, Distutils, Documentation, Extension Modules, 
FreeBSD, IDLE, IO, Installation, Interpreter Core, Parser, Regular Expressions, 
SSL, Subinterpreters, Tests, Tkinter, Unicode, Windows, XML, asyncio, ctypes, 
email, macOS
stage:  -> needs patch
type: performance -> behavior
versions:  -Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46654] urllib.request.urlopen doesn't handle UNC paths produced by pathlib's as_uri() (but can handle UNC paths with additional slashes)

2022-02-06 Thread Emanuelle Pharand


Change by Emanuelle Pharand :


--
assignee:  -> docs@python
components: +2to3 (2.x to 3.x conversion tool), Argument Clinic, Build, C API, 
Cross-Build, Demos and Tools, Distutils, Documentation, Extension Modules, 
FreeBSD, IDLE, IO, Installation, Interpreter Core, Library (Lib), Parser, 
Regular Expressions, SSL, Subinterpreters, Tests, Tkinter, Unicode, Windows, 
XML, asyncio, ctypes, email, macOS
nosy: +Alex.Willmer, asvetlov, barry, docs@python, dstufft, eric.araujo, 
ezio.melotti, koobs, ladykraken, larry, lys.nikolaou, mrabarnett, ned.deily, 
pablogsal, paul.moore, r.david.murray, ronaldoussoren, steve.dower, 
terry.reedy, tim.golden, vstinner, yselivanov, zach.ware
type:  -> performance
versions: +Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46654] urllib.request.urlopen doesn't handle UNC paths produced by pathlib's as_uri() (but can handle UNC paths with additional slashes)

2022-02-06 Thread Eryk Sun


Eryk Sun  added the comment:

> The value of req.selector never starts with "//", for which file_open() 
> checks, but rather a single slash, such as "/Z:/test.py" or 
> "/share/test.py".

To correct myself, actually req.selector will start with "//" for a "file:" 
URI, such as "file:host/share/test.py". For this example, req.host is an 
empty string, so file_open() still ends up calling open_local_file(), which 
will open "//host/share/test.py". In Linux, "//host/share" is the same as 
"/host/share". In Cygwin and MSYS2 it's a UNC path. I guess this case should be 
allowed, even though the meaning of a "//" root isn't specifically defined in 
POSIX.

Unless I'm overlooking something, file_open() only has to check the value of 
req.host. In POSIX, it should require opening a 'local' path, i.e. if req.host 
isn't None, empty, or a local host, raise URLError.

In Windows, my tests show that the shell API special cases "localhost" (case 
insensitive) in "file:" URIs. For example, the following are all equivalent: 
"file:/C:/Temp", "file:///C:/Temp", and "file://localhost/C:/Temp". The shell 
API does not special case the real local host name or any of its IP addresses, 
such as 127.0.0.1. They're all handled as UNC paths.

Here's what I've experimented with thus far, which passes the existing urllib 
tests in Linux and Windows:

class FileHandler(BaseHandler):
def file_open(self, req):
if not self._is_local_path(req):
if sys.platform == 'win32':
path = url2pathname(f'//{req.host}{req.selector}')
else:
raise URLError("In POSIX, the file:// scheme is only "
   "supported for local file paths.")
else:
path = url2pathname(req.selector)
return self._common_open_file(req, path)


def _is_local_path(self, req):
if req.host:
host, port = _splitport(req.host)
if port:
raise URLError(f"the host cannot have a port: {req.host}")
if host.lower() != 'localhost':
# In Windows, all other host names are UNC.
if sys.platform == 'win32':
return False
# In POSIX, support all names for the local host.
if _safe_gethostbyname(host) not in self.get_names():
return False
return True


# names for the localhost
names = None
def get_names(self):
if FileHandler.names is None:
try:
FileHandler.names = tuple(
socket.gethostbyname_ex('localhost')[2] +
socket.gethostbyname_ex(socket.gethostname())[2])
except socket.gaierror:
FileHandler.names = (socket.gethostbyname('localhost'),)
return FileHandler.names


def open_local_file(self, req):
if not self._is_local_path(req):
raise URLError('file not on local host')
return self._common_open_file(req, url2pathname(req.selector))


def _common_open_file(self, req, path):
import email.utils
import mimetypes
host = req.host
filename = req.selector
try:
if host:
origurl = f'file://{host}{filename}'
else:
origurl = f'file://{filename}'
stats = os.stat(path)
size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(filename)[0] or 'text/plain'
headers = email.message_from_string(
f'Content-type: {mtype}\n'
f'Content-length: {size}\n'
f'Last-modified: {modified}\n')
return addinfourl(open(path, 'rb'), headers, origurl)
except OSError as exp:
raise URLError(exp)


Unfortunately nturl2path.url2pathname() parses some UNC paths incorrectly. For 
example, the following path should be an invalid UNC path, since "C:" is an 
invalid name, but instead it gets converted into an unrelated local path.

>>> nturl2path.url2pathname('//host/C:/Temp/spam.txt')
'C:\\Temp\\spam.txt'

This goof depends on finding ":" or "|" in the path. It's arguably worse if the 
last component has a named data stream (allowed by RFC 8089):

>>> nturl2path.url2pathname('//host/share/spam.txt:eggs')
'T:\\eggs'

Drive "T:" is from "t:" in "t:eggs", due to simplistic path parsing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 

[issue46654] urllib.request.urlopen doesn't handle UNC paths produced by pathlib's as_uri() (but can handle UNC paths with additional slashes)

2022-02-05 Thread Barney Gale


Change by Barney Gale :


--
title: urllib.request.urlopen doesn't handle UNC paths produced by pathlib's 
resolve() (but can handle UNC paths with additional slashes) -> 
urllib.request.urlopen doesn't handle UNC paths produced by pathlib's as_uri() 
(but can handle UNC paths with additional slashes)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com