[issue22852] urllib.parse wrongly strips empty #fragment

Stian Soiland-Reyes Thu, 13 Nov 2014 01:47:18 -0800

Stian Soiland-Reyes added the comment:

I tried to make a patch for this, but I found it quite hard as the 
urllib/parse.py is fairly low-level, e.g. it is constantly encoding/decoding 
bytes and strings within each URI component. Basically the code assumes there 
are tuples of strings, with support for both bytes and strings baked in later.


As you see in 

https://github.com/stain/cpython/compare/issue-2285-urllib-empty-fragment?expand=1

the patch in parse.py is small - but the effect of that in test_urlparse.py is 
a bit bigger, as lots of test are testing for the result of urlsplit to have '' 
instead of None. It is uncertain how much real-life client code also check for 
'' directly. ("if not p.fragment" would of course still work - but "if 
p.fragment == ''" would not work anymore.

I therefore suggest an alternative to my patch above - to add some boolean 
fields like has_fragment, thus the existing component fields can keep their 
backwards compatible '' and b'' values even when a component is actually 
missing, and yet allowing geturl() to reconstitute the URI according to the RFC.

----------
hgrepos: +279

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22852>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22852] urllib.parse wrongly strips empty #fragment

Reply via email to