Eryk Sun <eryk...@gmail.com> added the comment:

> The value of req.selector never starts with "//", for which file_open() 
> checks, but rather a single slash, such as "/Z:/test.py" or 
> "/share/test.py".

To correct myself, actually req.selector will start with "//" for a "file:////" 
URI, such as "file:////host/share/test.py". For this example, req.host is an 
empty string, so file_open() still ends up calling open_local_file(), which 
will open "//host/share/test.py". In Linux, "//host/share" is the same as 
"/host/share". In Cygwin and MSYS2 it's a UNC path. I guess this case should be 
allowed, even though the meaning of a "//" root isn't specifically defined in 
POSIX.

Unless I'm overlooking something, file_open() only has to check the value of 
req.host. In POSIX, it should require opening a 'local' path, i.e. if req.host 
isn't None, empty, or a local host, raise URLError.

In Windows, my tests show that the shell API special cases "localhost" (case 
insensitive) in "file:" URIs. For example, the following are all equivalent: 
"file:/C:/Temp", "file:///C:/Temp", and "file://localhost/C:/Temp". The shell 
API does not special case the real local host name or any of its IP addresses, 
such as 127.0.0.1. They're all handled as UNC paths.

Here's what I've experimented with thus far, which passes the existing urllib 
tests in Linux and Windows:

    class FileHandler(BaseHandler):
        def file_open(self, req):
            if not self._is_local_path(req):
                if sys.platform == 'win32':
                    path = url2pathname(f'//{req.host}{req.selector}')
                else:
                    raise URLError("In POSIX, the file:// scheme is only "
                                   "supported for local file paths.")
            else:
                path = url2pathname(req.selector)
            return self._common_open_file(req, path)


        def _is_local_path(self, req):
            if req.host:
                host, port = _splitport(req.host)
                if port:
                    raise URLError(f"the host cannot have a port: {req.host}")
                if host.lower() != 'localhost':
                    # In Windows, all other host names are UNC.
                    if sys.platform == 'win32':
                        return False
                    # In POSIX, support all names for the local host.
                    if _safe_gethostbyname(host) not in self.get_names():
                        return False
            return True


        # names for the localhost
        names = None
        def get_names(self):
            if FileHandler.names is None:
                try:
                    FileHandler.names = tuple(
                        socket.gethostbyname_ex('localhost')[2] +
                        socket.gethostbyname_ex(socket.gethostname())[2])
                except socket.gaierror:
                    FileHandler.names = (socket.gethostbyname('localhost'),)
            return FileHandler.names


        def open_local_file(self, req):
            if not self._is_local_path(req):
                raise URLError('file not on local host')
            return self._common_open_file(req, url2pathname(req.selector))


        def _common_open_file(self, req, path):
            import email.utils
            import mimetypes
            host = req.host
            filename = req.selector
            try:
                if host:
                    origurl = f'file://{host}{filename}'
                else:
                    origurl = f'file://{filename}'
                stats = os.stat(path)
                size = stats.st_size
                modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
                mtype = mimetypes.guess_type(filename)[0] or 'text/plain'
                headers = email.message_from_string(
                            f'Content-type: {mtype}\n'
                            f'Content-length: {size}\n'
                            f'Last-modified: {modified}\n')
                return addinfourl(open(path, 'rb'), headers, origurl)
            except OSError as exp:
                raise URLError(exp)


Unfortunately nturl2path.url2pathname() parses some UNC paths incorrectly. For 
example, the following path should be an invalid UNC path, since "C:" is an 
invalid name, but instead it gets converted into an unrelated local path.

    >>> nturl2path.url2pathname('//host/C:/Temp/spam.txt')
    'C:\\Temp\\spam.txt'

This goof depends on finding ":" or "|" in the path. It's arguably worse if the 
last component has a named data stream (allowed by RFC 8089):

    >>> nturl2path.url2pathname('//host/share/spam.txt:eggs')
    'T:\\eggs'

Drive "T:" is from "t:" in "t:eggs", due to simplistic path parsing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46654>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to