On Feb 15, 2020, at 09:00, Senthil Kumaran <sent...@uthcode.com> wrote:
> As we have to a decision here, my vote is to revert the patch in 3.8.2 and 
> 3.7.7
> I have gone back-and-forth with this thinking, and it seems revert might 
> address some definite complaints we have got.
> The problem is contained to single version, and users can upgrade to the next 
> one.

> On Fri, Feb 14, 2020 at 8:14 AM Łukasz Langa <luk...@langa.pl> wrote:
>> Ned, what are you doing with this for 3.7.7? Reverting?

Ugh!

As others have noted, urlparse is a big can of worms.  I am certainly not a 
subject expert but, from some investigation and thinking about it, it seems to 
me that we kinda brought this on ourselves by allowing the scheme part (e.g. 
"https:" or "ftp:" or "any-old-scheme:" etc) of the urlstring parameter to be 
optional:

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

therby introducing the ambiguity of whether a string like "localhost:80" 
denotes a relative url with a user-defined scheme of "localhost" and a path of 
"80" (as it now does with the changes for bpo-27657 introduced in 3.8.1 and 
3.7.6):

>>> urlparse("localhost:80")
ParseResult(scheme='localhost', netloc='', path='80', params='', query='', 
fragment='')

or denotes a relative url with no scheme and a path of "localhost:80" (as 
happened in previous releases):

>>> urlparse("localhost:80")
ParseResult(scheme='', netloc='', path='localhost:80', params='', query='', 
fragment='')

With an explicit scheme, in either case you get what you would expect - an 
absolute url:

>>> urlparse("http://localhost:80";)
ParseResult(scheme='http', netloc='localhost:80', path='', params='', query='', 
fragment='')

AFAICT the intent of the original RFCs was to require an explicit scheme in a 
urlstring, thus avoiding any ambiguity.  But the now universal practice of web 
browsers supplying a default http: or https: scheme for (partial) urls typed 
into a location bar has understandably changed user expectations to often be 
that schemes are optional when the scheme is clear in context.

So it seems to me that there is no one obviously correct behavior here.  
Judging from the comments and the reports of broken packages, many users are 
clearly used to using urlparse with schemeless urlstrings even if they aren't 
truly conformant URLs and even with the at first glance unintuitive way they 
were parsed by urlparse; for example, there is this snippet in the third-party 
requests package:

    # urlparse is a finicky beast, and sometimes decides that there isn't a
    # netloc present. Assume that it's being over-cautious, and switch netloc
    # and path if urlparse decided there was no netloc.
    if not netloc:
        netloc, path = path, netloc

OTOH, there are also undoubtedly users who want a urlparser that more strictly 
parses schemeless URLs, which is now the behavior as of 3.8.1 and 3.7.6, again, 
even if the new behavior is also unintuitive.

I don't see how we can satisfy both use cases without changing the API somehow. 
 And there may be other use cases.

The good news is that, AFAICT from a quick survey, the change didn't affect 
urllib.urlopen or thrid-party urllib3 or requests.  But from the "me-toos" on 
the bpo issue and the PR, it's clear that we broke stuff downstream and it 
seems that most of those users are waiting for a resolution from us and likely 
would prefer to stick to the previous behavior.

So my take is that we should revert the 3.7 changes (bpo-27657 / PR 16837 / 
82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f ).  Senthil, please go ahead and do so 
for the 3.7 branch.  Thanks!

While it's not my call, I think we should also revert for 3.8.2.

For 3.9.0, I recommend we reconsider this change (temporarily reverting it) and 
consider whether an API change to accommodate the various use cases would be 
better; perhaps something like adding a new parameter to urlparse to indicate 
whether urlstrings should be parsed like webbrowser "urls" (and defining 
exactly what that means) and also review the many remaining open urlparse bpo 
issues to look for commonalities.  (Perhaps that could be a post-3.9 GsoC 
project?)

Thoughts?

In any case, ugh!

--
  Ned Deily
  n...@python.org -- []
_______________________________________________
python-committers mailing list -- python-committers@python.org
To unsubscribe send an email to python-committers-le...@python.org
https://mail.python.org/mailman3/lists/python-committers.python.org/
Message archived at 
https://mail.python.org/archives/list/python-committers@python.org/message/LSQG3M7G4WB7LNOSBU52AMAKB7LBD7WT/
Code of Conduct: https://www.python.org/psf/codeofconduct/

Reply via email to