New submission from Nick Welch <mackst...@gmail.com>:

While the netloc/path parts of URLs are scheme-specific, and urlparse can be 
forgiven for refusing to parse them for unknown schemes, the query and fragment 
parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used 
in all URI schemes. Every URI is defined as consisting of four parts, as 
follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax


Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', 
query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', 
fragment='frag')

----------
components: Library (Lib)
messages: 111511
nosy: Nick.Welch
priority: normal
severity: normal
status: open
title: urlparse should parse query and fragment for arbitrary schemes
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9374>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to