[python-committers] Re: A urlparse regression in minor version

Ned Deily Sun, 16 Feb 2020 10:20:51 -0800

On Feb 16, 2020, at 07:21, Antoine Pitrou <anto...@python.org> wrote:
> FWIW, I agree with Senthil here.  A slight behaviour change in 3.9 is
> fine, especially in an area where the "right" semantics are not
> immediately obvious.  What we want to avoid is breaking behaviour
> changes in bugfix releases.

I agree totally that we don't want to break behavior in bugfix releases and I 
have no problem with making breaking changes in feature releases (3.9.0) as 
warranted.

My point was that, after looking at this a bit, it seems to me that making this 
change does not address some of the underlying problems with the urlparse API 
and that it makes things *much* worse for the many users who are understandably 
expecting urlparse to sanely handle schemaless urlstrings, the most commonly 
seen urls format today.

Note that we strongly imply that we sanely handle them by offering the 
"scheme=" paramater to urlparse.  Another example: prior to 3.7.6 and 3.8.1:

>>> urlparse("www.google.com:8080", scheme="http")
ParseResult(scheme='http', netloc='', path='www.google.com:8080', params='', 
query='', fragment='')

That isn't what users would expect; what they would expect is how things work 
with an explicit scheme (note the swapping of netloc and path).

>>> urlparse("https://www.google.com:8080";, scheme="http")
ParseResult(scheme='https', netloc='www.google.com:8080', path='', params='', 
query='', fragment='')

But at least there is a relatively simple workaround that users have discovered 
as witnessed by the requests code snippet I cited earlier: use the path field 
if netloc is empty.

Now with the change in 3.8.1 and 3.7.6, the behavior is very different and 
pretty useless even with an explicit scheme="http" parameter:

>>> urlparse("www.google.com:8080", scheme="http")
ParseResult(scheme='www.google.com', netloc='', path='8080', params='', 
query='', fragment='')

i.e. www.google.com://8000

While that may be what strict adherence to the RFC dictates, most users aren't 
going to expect or desire results like that.  So while the change may fix some 
cases, it's only making matters worse.  What kind of workaroud do you use for 
that result?

In another open issue concerning a different urlparse issue, Victor noted that 
(4 months ago) "there are 124 open issues with "urllib" in their title and 12 
open issues with "urlparse" in their title" and hit a bit of a dead end with a 
proposed fix.

https://bugs.python.org/issue36338#msg355322

Rather than continuing this change in 3.9 introducing yet another, even more 
unexpected behavior, I think we should first try to address what appears to me 
to be the (a?) root cause issue: urlparse's API is not suited for parsing both 
strictly RFC-compliant URLs (which are clearly not well-understood) *and* 
today's schemeless URLs as have evolved over the years to become the most 
commonly encountered form of URL.  Users want and need both.  The merged change 
makes the previous situation worse, IMHO.

Le 16/02/2020 à 13:13, Senthil Kumaran a écrit :
>> 
>> On Sun, Feb 16, 2020 at 2:20 AM Ned Deily <n...@python.org
>> <mailto:n...@python.org>> wrote:
>> 
>> 
>> 
>>    For 3.9.0, I recommend we reconsider this change (temporarily
>>    reverting it) and consider whether an API change to accommodate the
>>    various use cases would be better
>> 
>> 
>> For 3.9. - I am ready to defend the patch even at the cost of the
>> breaking of the parsing of undefined behavior.  We should keep it. The
>> patch simplifies a lot of corner cases and fixes the reported bugs. We
>> don't guarantee backward compatibility between major versions, so I
>> assume users will be careful when relying upon this undefined behavior
>> and will take corrective action on their side before upgrading to 3.9.
>> 
>> We want patch releases to be backward compatible. That was the
>> user-complaint.
>> 
>> Thanks,
>> Senthil
>> 
>> 
>> 
>> _______________________________________________
>> python-committers mailing list -- python-committers@python.org
>> To unsubscribe send an email to python-committers-le...@python.org
>> https://mail.python.org/mailman3/lists/python-committers.python.org/
>> Message archived at 
>> https://mail.python.org/archives/list/python-committers@python.org/message/SQE6TKOYZKEFGWMUHU5RCHRVWJ27TIQV/
>> Code of Conduct: https://www.python.org/psf/codeofconduct/
>> 
> _______________________________________________
> python-committers mailing list -- python-committers@python.org
> To unsubscribe send an email to python-committers-le...@python.org
> https://mail.python.org/mailman3/lists/python-committers.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-committers@python.org/message/P6N7GWW5RQ66467NXYJPHV6JKBY4QM27/
> Code of Conduct: https://www.python.org/psf/codeofconduct/

--
  Ned Deily
  n...@python.org -- []
_______________________________________________
python-committers mailing list -- python-committers@python.org
To unsubscribe send an email to python-committers-le...@python.org
https://mail.python.org/mailman3/lists/python-committers.python.org/
Message archived at 
https://mail.python.org/archives/list/python-committers@python.org/message/3F6KY7X4ZKHDKRZM3UFZNWUDW5BWW65X/
Code of Conduct: https://www.python.org/psf/codeofconduct/

[python-committers] Re: A urlparse regression in minor version

Reply via email to