STINNER Victor <vstin...@python.org> added the comment:

(The first message is basically David's email rephrased. Here is my reply ;-))

> This could present issues if server-side checks are used by applications to 
> validate a URLs authority.

Which kind of application would be affected by this vulnerability?

It's unclear to me if urllib should be modified to explicitly reject \ in 
netloc, or if only third-party code should pay attention to this corner case 
(potential vulnerability).


The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code 
which use an "authority" variable.

Example:
---
from urllib.parse import urlsplit, _splitport, _splittype, _splituser,
_splitpasswd

def _parse_proxy(proxy):
    """Return (scheme, user, password, host/port) given a URL or an authority.

    If a URL is supplied, it must have an authority (host:port) component.
    According to RFC 3986, having an authority component means the URL must
    have two slashes after the scheme.
    """
    scheme, r_scheme = _splittype(proxy)
    if not r_scheme.startswith("/"):
        # authority
        scheme = None
        authority = proxy
    else:
        # URL
        if not r_scheme.startswith("//"):
            raise ValueError("proxy URL with no authority: %r" % proxy)
        # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
        # and 3.3.), path is empty or starts with '/'
        end = r_scheme.find("/", 2)
        if end == -1:
            end = None
        authority = r_scheme[2:end]
    userinfo, hostport = _splituser(authority)
    if userinfo is not None:
        user, password = _splitpasswd(userinfo)
    else:
        user = password = None
    return scheme, user, password, hostport


def reduce_uri(uri, default_port=True):
    """Accept authority or URI and extract only the authority and path."""
    # note HTTP URLs do not have a userinfo component
    parts = urlsplit(uri)
    if parts[1]:
        # URI
        scheme = parts[0]
        authority = parts[1]
        path = parts[2] or '/'
    else:
        # host or host:port
        scheme = None
        authority = uri
        path = '/'
    host, port = _splitport(authority)
    if default_port and port is None and scheme is not None:
        dport = {"http": 80,
                 "https": 443,
                 }.get(scheme)
        if dport is not None:
            authority = "%s:%d" % (host, dport)
    return authority, path

def test(uri):
    print(f"{uri} => reduce_uri: {reduce_uri(uri)}")
    print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}")

test(r"https://www.example.com";)
test(r"https://u...@www.example.com";)
test(r"https://xdavidhu.me\test.corp.google.com";)
test(r"https://user:passw...@xdavidhu.me\test.corp.google.com";)
---

Output on Python 3.9:
---
https://www.example.com => reduce_uri: ('www.example.com:443', '/')
https://www.example.com => _parse_proxy: ('https', None, None,
'www.example.com')
https://u...@www.example.com => reduce_uri: ('u...@www.example.com:443', '/')
https://u...@www.example.com => _parse_proxy: ('https', 'user', None,
'www.example.com')
https://xdavidhu.me\test.corp.google.com => reduce_uri:
('xdavidhu.me\\test.corp.google.com:443', '/')
https://xdavidhu.me\test.corp.google.com => _parse_proxy: ('https',
None, None, 'xdavidhu.me\\test.corp.google.com')
https://user:passw...@xdavidhu.me\test.corp.google.com => reduce_uri:
('user:passw...@xdavidhu.me\\test.corp.google.com:443', '/')
https://user:passw...@xdavidhu.me\test.corp.google.com =>
_parse_proxy: ('https', 'user', 'password',
'xdavidhu.me\\test.corp.google.com')
---

It seems to behave as expected, no?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40338>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to