New submission from Nick Welch <mackst...@gmail.com>: While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.
According to Wikipedia: ------------------ Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows: <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ] ------------------ http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax Here is a demonstration of what urlparse currently does: >>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag') SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='') >>> urlparse.urlsplit('http://netloc/path?a=b#frag') SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag') ---------- components: Library (Lib) messages: 111511 nosy: Nick.Welch priority: normal severity: normal status: open title: urlparse should parse query and fragment for arbitrary schemes type: behavior versions: Python 2.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9374> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com