Eryk Sun <eryk...@gmail.com> added the comment:
> The value of req.selector never starts with "//", for which file_open() > checks, but rather a single slash, such as "/Z:/test.py" or > "/share/test.py". To correct myself, actually req.selector will start with "//" for a "file:////" URI, such as "file:////host/share/test.py". For this example, req.host is an empty string, so file_open() still ends up calling open_local_file(), which will open "//host/share/test.py". In Linux, "//host/share" is the same as "/host/share". In Cygwin and MSYS2 it's a UNC path. I guess this case should be allowed, even though the meaning of a "//" root isn't specifically defined in POSIX. Unless I'm overlooking something, file_open() only has to check the value of req.host. In POSIX, it should require opening a 'local' path, i.e. if req.host isn't None, empty, or a local host, raise URLError. In Windows, my tests show that the shell API special cases "localhost" (case insensitive) in "file:" URIs. For example, the following are all equivalent: "file:/C:/Temp", "file:///C:/Temp", and "file://localhost/C:/Temp". The shell API does not special case the real local host name or any of its IP addresses, such as 127.0.0.1. They're all handled as UNC paths. Here's what I've experimented with thus far, which passes the existing urllib tests in Linux and Windows: class FileHandler(BaseHandler): def file_open(self, req): if not self._is_local_path(req): if sys.platform == 'win32': path = url2pathname(f'//{req.host}{req.selector}') else: raise URLError("In POSIX, the file:// scheme is only " "supported for local file paths.") else: path = url2pathname(req.selector) return self._common_open_file(req, path) def _is_local_path(self, req): if req.host: host, port = _splitport(req.host) if port: raise URLError(f"the host cannot have a port: {req.host}") if host.lower() != 'localhost': # In Windows, all other host names are UNC. if sys.platform == 'win32': return False # In POSIX, support all names for the local host. if _safe_gethostbyname(host) not in self.get_names(): return False return True # names for the localhost names = None def get_names(self): if FileHandler.names is None: try: FileHandler.names = tuple( socket.gethostbyname_ex('localhost')[2] + socket.gethostbyname_ex(socket.gethostname())[2]) except socket.gaierror: FileHandler.names = (socket.gethostbyname('localhost'),) return FileHandler.names def open_local_file(self, req): if not self._is_local_path(req): raise URLError('file not on local host') return self._common_open_file(req, url2pathname(req.selector)) def _common_open_file(self, req, path): import email.utils import mimetypes host = req.host filename = req.selector try: if host: origurl = f'file://{host}{filename}' else: origurl = f'file://{filename}' stats = os.stat(path) size = stats.st_size modified = email.utils.formatdate(stats.st_mtime, usegmt=True) mtype = mimetypes.guess_type(filename)[0] or 'text/plain' headers = email.message_from_string( f'Content-type: {mtype}\n' f'Content-length: {size}\n' f'Last-modified: {modified}\n') return addinfourl(open(path, 'rb'), headers, origurl) except OSError as exp: raise URLError(exp) Unfortunately nturl2path.url2pathname() parses some UNC paths incorrectly. For example, the following path should be an invalid UNC path, since "C:" is an invalid name, but instead it gets converted into an unrelated local path. >>> nturl2path.url2pathname('//host/C:/Temp/spam.txt') 'C:\\Temp\\spam.txt' This goof depends on finding ":" or "|" in the path. It's arguably worse if the last component has a named data stream (allowed by RFC 8089): >>> nturl2path.url2pathname('//host/share/spam.txt:eggs') 'T:\\eggs' Drive "T:" is from "t:" in "t:eggs", due to simplistic path parsing. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue46654> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com