[issue33480] Improvement suggestions for urllib.parse.urlparser

2018-05-13 Thread R. David Murray

R. David Murray  added the comment:

These are actually reasonable requests, and in fact have been brought up before 
and implemented:

>>> x = 
>>> urlparse('http://me:myp...@example.com:800/foo;key1=value1?key2=value2#key3=value3#key4=value4')
>>> x
ParseResult(scheme='http', netloc='me:myp...@example.com:800', path='/foo', 
params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')
>>> x.hostname
'example.com'
>>> x.port
800
>>> x.username
'me'
>>> x.password
'mypass'
>>> x._asdict()
OrderedDict([('scheme', 'http'), ('netloc', 'me:myp...@example.com:800'), 
('path', '/foo'), ('params', 'key1=value1'), ('query', 'key2=value2'), 
('fragment', 'key3=value3#key4=value4')])


Now, what this doesn't get you is the "extra" fields that are not part of the 
base tuple.  The base tuple has the members it does for backward compatibility. 
 So, the thing to discuss on python-ideas would be an API for namedtuple that 
gets you the extra fields.

None versus the empty string is not something that can happen, for backward 
compatibility reasons, even if there was agreement that it was better.

I'm not entirely sure why dict(x) is not supported (but I suspect it is because 
x is "a tuple", again for backward compatibility reasons), so you might search 
the archives to find out why for sure, if you  are curious.

--
nosy: +r.david.murray
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33480] Improvement suggestions for urllib.parse.urlparser

2018-05-13 Thread Ivan Pozdeev

Ivan Pozdeev  added the comment:

Such drastic changes of uncertain usefulness are best discussed at python-ideas 
first.

What you're really asking for seems to be to parse all "levels" at the same 
time.
Try to think of a use case that would make that help anything practical and 
bring that to the list.
I fail to see any use case 'cuz you never need query parameters and things like 
username/port at the same time.


All else that you suggest is either already being done (username/port parsing, 
read the docs) or likewise has no use cases I can think of where it would make 
things more convenient than they already are (dict emulation, None).

--
nosy: +Ivan.Pozdeev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33480] Improvement suggestions for urllib.parse.urlparser

2018-05-13 Thread brent s.

New submission from brent s. :

Currently, a parsed urlparse() object looks (roughly) like this:

urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')

returns:

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')

However, I recommend a couple things:

0.) that ParseResult objects support dict emulation. e.g. one can run:

dict(parseresult_obj)

and get (using the example string above (corrected classification for 
RFC2986 compliance and common usage):

{'fragment': [('key4', 'value4')],
 'netloc': 'foo.tld',
 'params': [('key2', 'value2')],
 'path': '/foo',
 'query': [('key3', 'value3')],
 'scheme': 'http'}

Obviously, fragment, params, and query could instead be serialized into a 
nested dict. I'm not sure which is more preferred in the pythonic sense.

1.) Better RFC3986 compliance.
Per RFC3986 ยง 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL 
can be further split into separate components. For instance, while considered 
deprecated, should "userinfo" (e.g. "http://user:password@...;) be parsed? At 
the very least, the port should be parsed out to a separate component from the 
netloc (or userinfo parsed out separate from netloc) - this will assist in 
parsing host:port combinations in netlocs that contain both userinfo and a 
specified port (and allow the port to be given as an int type, thus more easily 
used in e.g. the socket lib).

2.) If a component is not present, I suggest it be a None object instead of an 
empty string.
e.g.:

urlparse('http://example.com/foo')

Would return:

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params=None, query=None, fragment=None)

instead of

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='', query='', fragment='')

--
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com