[issue18140] urlparse, urlsplit confused when password includes fragment (#), query (?)

david.six Mon, 10 Aug 2020 06:18:53 -0700


david.six <davesix+pyb...@gmail.com> added the comment:


tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when 
they appear in userinfo which will already parse correctly.

---

Following up on what Martin said, RFC 3986 has the specifications for how these 
examples should be parsed.

userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Notably, gen-delims are _not_ included in the allowed characters, nor are 
non-ASCII characters.

gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"

These and other characters not mentioned should be URL-encoded/%-encoded if 
they appear in the password.

Taking the first example:

>>> from urllib.parse import urlparse
>>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html'
>>> urlparse(u)
ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', 
path='/a/b/c.html', params='', query='', fragment='')
>>> unquote(urlparse(u).password)
'secr#et'

----------
nosy: +david.six
status: pending -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue18140>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18140] urlparse, urlsplit confused when password includes fragment (#), query (?)

Reply via email to