david.six <davesix+pyb...@gmail.com> added the comment:
tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when they appear in userinfo which will already parse correctly. --- Following up on what Martin said, RFC 3986 has the specifications for how these examples should be parsed. userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" Notably, gen-delims are _not_ included in the allowed characters, nor are non-ASCII characters. gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" These and other characters not mentioned should be URL-encoded/%-encoded if they appear in the password. Taking the first example: >>> from urllib.parse import urlparse >>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html' >>> urlparse(u) ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', path='/a/b/c.html', params='', query='', fragment='') >>> unquote(urlparse(u).password) 'secr#et' ---------- nosy: +david.six status: pending -> open _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue18140> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com