[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Gregory P. Smith
On Sun, Feb 6, 2022 at 9:13 AM Paul Moore  wrote:

> On Sun, 6 Feb 2022 at 16:51, Christian Heimes 
> wrote:
>
> > The urllib package -- and to some degree also the http package -- are
> > constant source of security bugs. The code is old and the parsers for
> > HTTP and URLs don't handle edge cases well. Python core lacks a true
> > maintainer of the code. To be honest, we have to admit defeat and be up
> > front that urllib is not up to the task for this decade. It was designed
> > written during a more friendly, less scary time on the internet.
> >
> > If I had the power and time, then I would replace urllib with a simpler,
> > reduced HTTP client that uses platform's HTTP library under the hood
> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> > aiohttp are much better suited than urllib.
> >
> > The second best option is to reduce the feature set of urllib to core
> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> > more standard conform parsers for urls, query strings, and RFC 2822
> > instead of RFC 822 for headers.
>
> I'd likely be fine with either of these two options. I'm not worried
> about supporting "advanced" uses. But having no way of getting a file
> from the internet without relying on 3rd party packages seems like a
> huge gap in functionality for a modern language. And having to use a
> 3rd party library to parse URLs will simply push more people to use
> home-grown regexes rather than something safe and correct. Remember
> that a lot of Python users are not professional software developers,
> but scientists, data analysts, and occasional users, for whom the
> existence of something in the stdlib is the *only* reason they have
> any idea that URLs need specialised parsing in the first place.
>
> And while we all like to say 3rd party modules are great, the reality
> is that they provide a genuine problem for many of these
> non-specialist users - and I say that as a packaging specialist and
> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
> the way that core Python is, much as we're trying to improve that
> situation.
>
> I've said it previously, but I'll reiterate - IMO this *must* have a
> PEP, and that PEP must be clear that the intention is to *remove*
> urllib, not simply to "deprecate and then think about it". That could
> be making it part of PEP 594, or a separate PEP, but one way or
> another it needs a PEP.
>

This would need to be it's own PEP.  urllib et. al. are used by virtually
everybody.  They're highly used batteries.

I'm -1 on deprecating it for that reason alone.

Christian proposes that having a simpler scope rewrite of it might be nice,
but I think disruption to the world and loss of trust in Python would be
similar either way.

-gps


>
> Paul
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Ethan Furman

On 2/6/22 6:08 AM, Victor Stinner wrote:

> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.

Besides the needs of pip, round-up, etc., I think we should keep whatever parts of urllib, cgi, cgitb, http, etc., are 
necessary for basic serving/consuming of web pages for the same reason we ended up keeping the wave module -- it's fun 
and engaging for a younger audience.  Having one computer get information from another is pretty cool.


If we need to do some trimming and rearranging of the above modules, that's fine, but I think losing all the 
functionality would be a mistake.


--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TGENXEKPFCIZUQD63ROCIK2WGAN3F7XL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread sethmichaellarson
Chiming in to say that whichever way this goes urllib3 would be okay. We can 
always vendor the small amount of http.client logic we actually depend on for 
HTTP connections. I do agree that the future of HTTP clearly lies outside the 
standard library, our team is already thinking about ways to integrate 
non-http.client HTTP implementations (like HTTP/2).

My feeling is that it will be difficult to remove urllib.parse, however 
urllib.request is much less depended on and more likely to be deprecated and 
removed.

Also clarifying that httplib2 doesn't support HTTP/2, the HTTP/2 package of 
interest is usually h2: https://pypi.org/project/h2. "http3" also doesn't 
implement HTTP/3 (bad name), this was one of the potential names for the HTTPX 
project.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AW3JP6DHEAKME5FTFNRHV3EJMPJQEDME/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Victor Stinner
On Sun, Feb 6, 2022 at 3:35 PM Paul Moore  wrote:
> urllib.request may not be "best practice", but it's still extremely
> useful for simple situations, and urllib.parse is useful for basic
> handling of URLs.Yes, the more complex aspects of urllib are better
> handled by external packages, but that's not sufficient argument for
> removing the package altogether. There are many situations where
> external dependencies are unsuitable. Also, there's quite a lot of
> usage of urllib in the stdlib itself - how would you propose to
> replace that?
> (...)
> In addition, pip relies pretty heavily on urllib (parse and request),
> and pip has a bootstrapping issue, so using 3rd party libraries is
> non-trivial.

If a project like urllib3 uses it, urllib can be copied there and its
maintenance will continue there. Or maybe the maintenance can be moved
into a new project on PyPI like "legacy_urllib".

It's situation similar to the distutils deprecation: setuptools
decided to include a hidden copy of the distutils in its source, and
the distutils maintenance moved there. IMO it's a great move.
setuptools is a better place than Python to maintain this code:
setuptools release cycle is faster and is related to pip. Python
release cycle is slow and the distutils API was too big. Since the
distutils API is now hidden, setuptools can freely drop code and
changing APIs without affecting the public setuptools API.

I'm well aware that moving distutils into setuptools caused troubles.
IMO it is worth it and we have to go trough these issues once for a
better maintenance burden in the long term.


> In any case, why is this being proposed as a simple posting on
> python-dev? There's already PEP 594 for removals from the stdlib.

urllib is bigger than modules proposed for deprecation in PEP 594.
Also, I expect that deprecating urllib is more controversial.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UR6BT5S2S4WGEI62MRWHCRAPZNTQXTVT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Jelle Zijlstra
El dom, 6 feb 2022 a las 9:12, Paul Moore () escribió:

> On Sun, 6 Feb 2022 at 16:51, Christian Heimes 
> wrote:
>
> > The urllib package -- and to some degree also the http package -- are
> > constant source of security bugs. The code is old and the parsers for
> > HTTP and URLs don't handle edge cases well. Python core lacks a true
> > maintainer of the code. To be honest, we have to admit defeat and be up
> > front that urllib is not up to the task for this decade. It was designed
> > written during a more friendly, less scary time on the internet.
> >
> > If I had the power and time, then I would replace urllib with a simpler,
> > reduced HTTP client that uses platform's HTTP library under the hood
> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> > aiohttp are much better suited than urllib.
> >
> > The second best option is to reduce the feature set of urllib to core
> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> > more standard conform parsers for urls, query strings, and RFC 2822
> > instead of RFC 822 for headers.
>
> I'd likely be fine with either of these two options. I'm not worried
> about supporting "advanced" uses. But having no way of getting a file
> from the internet without relying on 3rd party packages seems like a
> huge gap in functionality for a modern language. And having to use a
> 3rd party library to parse URLs will simply push more people to use
> home-grown regexes rather than something safe and correct. Remember
> that a lot of Python users are not professional software developers,
> but scientists, data analysts, and occasional users, for whom the
> existence of something in the stdlib is the *only* reason they have
> any idea that URLs need specialised parsing in the first place.
>
> And while we all like to say 3rd party modules are great, the reality
> is that they provide a genuine problem for many of these
> non-specialist users - and I say that as a packaging specialist and
> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
> the way that core Python is, much as we're trying to improve that
> situation.
>
> I've said it previously, but I'll reiterate - IMO this *must* have a
> PEP, and that PEP must be clear that the intention is to *remove*
> urllib, not simply to "deprecate and then think about it". That could
> be making it part of PEP 594, or a separate PEP, but one way or
> another it needs a PEP.
>
PEP 594 is meant to be a set of uncontroversial removals of mostly unused
modules. Removing urllib is obviously not going to be uncontroversial, so
it should be discussed in a separate PEP.


>
> Paul
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HQ5J7BTB5WW77CQIQXX5FQKBOOIADBYR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
On Sun, 6 Feb 2022 at 16:51, Christian Heimes  wrote:

> The urllib package -- and to some degree also the http package -- are
> constant source of security bugs. The code is old and the parsers for
> HTTP and URLs don't handle edge cases well. Python core lacks a true
> maintainer of the code. To be honest, we have to admit defeat and be up
> front that urllib is not up to the task for this decade. It was designed
> written during a more friendly, less scary time on the internet.
>
> If I had the power and time, then I would replace urllib with a simpler,
> reduced HTTP client that uses platform's HTTP library under the hood
> (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> aiohttp are much better suited than urllib.
>
> The second best option is to reduce the feature set of urllib to core
> HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> more standard conform parsers for urls, query strings, and RFC 2822
> instead of RFC 822 for headers.

I'd likely be fine with either of these two options. I'm not worried
about supporting "advanced" uses. But having no way of getting a file
from the internet without relying on 3rd party packages seems like a
huge gap in functionality for a modern language. And having to use a
3rd party library to parse URLs will simply push more people to use
home-grown regexes rather than something safe and correct. Remember
that a lot of Python users are not professional software developers,
but scientists, data analysts, and occasional users, for whom the
existence of something in the stdlib is the *only* reason they have
any idea that URLs need specialised parsing in the first place.

And while we all like to say 3rd party modules are great, the reality
is that they provide a genuine problem for many of these
non-specialist users - and I say that as a packaging specialist and
pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
the way that core Python is, much as we're trying to improve that
situation.

I've said it previously, but I'll reiterate - IMO this *must* have a
PEP, and that PEP must be clear that the intention is to *remove*
urllib, not simply to "deprecate and then think about it". That could
be making it part of PEP 594, or a separate PEP, but one way or
another it needs a PEP.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
On Sun, 6 Feb 2022 at 14:15, Victor Stinner  wrote:
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.

Also, I'm -1 on deprecating as a way of saying we *might* remove the
module, but haven't decided yet. That isn't (IMO) what deprecation is
for, and it doesn't give users a clear message, as maybe they'll be
fine continuing to rely on urllib. The net result would likely to be
for people to simply become more inclined to ignore deprecation
warnings.

Conversely, if the idea is to deprecate, and then in a couple of years
say "well, it's been deprecated for a while now, so let's remove it"
then that seems to me to be a rather cynical way of deflecting
arguments, as we can say now "well, it's only deprecation", in spite
of the fact that the real intention is to remove.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ICHMNBE7PMOHCGXLT4REP2HJZAGSOCHJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Christian Heimes

On 06/02/2022 15.08, Victor Stinner wrote:

Hi,

I propose to deprecate the urllib module in Python 3.11. It would emit
a DeprecationWarning which warn users, so users should consider better
alternatives like urllib3 or httpx: well known modules, better
maintained, more secure, support HTTP/2 (httpx), etc.

I don't propose to schedule its removal. Let's discuss the removal in
1 or 2 years.

--

urllib has many abstraction to support a wide range of protocols with
"handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
authentication, HTTP Cookie, etc. A simple HTTP request using Basic
Authentication requires 10-20 lines of code, whereas it should be a
single line.

Users (me included) don't like urllib API which was too complicated
for common tasks.

--



[...]



urllib is a package made of 4 parts:

* urllib.request for opening and reading URLs
* urllib.error containing the exceptions raised by urllib.request
* urllib.parse for parsing URLs
* urllib.robotparser for parsing robots.txt files

I propose to deprecate all of them. Maybe the deprecation can be
different for each sub-module?


Thanks for bringing this topic forward, Victor!

Disclaimer: I proposed the removal of urllib today in Python core's 
internal chat.


The urllib package -- and to some degree also the http package -- are 
constant source of security bugs. The code is old and the parsers for 
HTTP and URLs don't handle edge cases well. Python core lacks a true 
maintainer of the code. To be honest, we have to admit defeat and be up 
front that urllib is not up to the task for this decade. It was designed 
written during a more friendly, less scary time on the internet.


If I had the power and time, then I would replace urllib with a simpler, 
reduced HTTP client that uses platform's HTTP library under the hood 
(WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, 
maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or 
aiohttp are much better suited than urllib.


The second best option is to reduce the feature set of urllib to core 
HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, 
more standard conform parsers for urls, query strings, and RFC 2822 
instead of RFC 822 for headers.


Christian


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WYVETVHMGRS4CI47GTFY6W7B43YLSJH2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
That was just one example, here are others in the pip code base that
urllib.request is used for more than the pathname functions, they are all
vendored or tests but would still be disruptive to remove:

https://github.com/pypa/pip/blob/main/tests/lib/local_repos.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/webencodings/mklabels.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/requests/compat.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/distlib/compat.py

In particular the vendored library, and replacement you suggest, "requests"
is very dependent on the proxy functions such as "getproxies" that are
currently in urllib.requests. More than once I've had to go down the rabbit
hole of seeing where those functions get that info for each platform.

Damian (he/him)


On Sun, Feb 6, 2022 at 11:10 AM Victor Stinner  wrote:

> On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw 
> wrote:
> >
> > Pip vendors requests for network calls:
> https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests
> >
> > But still does depend on functions from urllib.parse and urllib.request
> in many places:
> https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py
>
> Aha, it doesn't use urllib.request to open a HTTP connection, it only
> uses pathname2url() and url2pathname() functions of urllib.request.
> Maybe we can keep these functions. I'm not sure why they don't belong
> to urllib.parse.
>
> If urllib.parse is widely used, maybe we can keep this module.
>
> Victor
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ACA7AU4W6XB35PA6O4IYBPQSQD3HFLFS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Senthil Kumaran
On Sun, Feb 06, 2022 at 03:08:40PM +0100, Victor Stinner wrote:

> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
> 
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.

I am not certain if we can deprecate/remove the whole 'urllib' module without 
any good plan for replacement 
of its facilities within the stdlib. There is heavy usage of urllib.parse in 
multiple projects (including in urllib3), 
and parse is semi-maintained. 

> Let's come back to urllib:

> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security 
> issues!

I agree with all of these.
I think that removing the old cruft code, might lead to us to closing a number 
of open issues.

>  The 3 open security issues:

Just because if something marked 'security' doesn't make it actionable too. 
For instance the last one asks for urllib to maintain client state to be safe 
against a scenario, which it never did.

I don't think it is time to deprecate the urllib module. It will be too 
disruptive IMO. SO, -1.

Right now, I don't have a solution.  
My suggestion will be we close old bugs, and remove old code (aka maintain a 
bit, and it falls on me too).
Then we can probably chart out a deprecation / replacement path in a 
non-disruptive manner.


-- 
Senthil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ORQEJXJTZDYYV53MHKXTJ3Q6W72AUSGA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Victor Stinner
On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw  wrote:
>
> Pip vendors requests for network calls: 
> https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests
>
> But still does depend on functions from urllib.parse and urllib.request in 
> many places: 
> https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py

Aha, it doesn't use urllib.request to open a HTTP connection, it only
uses pathname2url() and url2pathname() functions of urllib.request.
Maybe we can keep these functions. I'm not sure why they don't belong
to urllib.parse.

If urllib.parse is widely used, maybe we can keep this module.

Victor
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PDFGPDGESBLSBHVLINCPAFEOHXQWFIRI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: I want to contribute to Python.

2022-02-06 Thread Victor Stinner
On Sun, Feb 6, 2022 at 3:33 PM Ezekiel Adetoro  wrote:
> Hello,
> My name is Ezekiel, and it is my desire to start contributing to Python, be 
> part of the core development of Python. I have forked the CPython and cloned 
> it. What is the next step I need to do?

Welcome Ezekiel! I suggest you to start reading
https://devguide.python.org/ and join the core-mentorship mailing list
which is the best place for such question!
https://mail.python.org/mailman3/lists/core-mentorship.python.org/

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R63GLDZQ72USFPJDOZD73DNRUTWNUDHC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: I want to contribute to Python.

2022-02-06 Thread MRAB

On 2022-02-06 13:18, Ezekiel Adetoro wrote:

Hello,
My name is Ezekiel, and it is my desire to start contributing to Python, be 
part of the core development of Python. I have forked the CPython and cloned 
it. What is the next step I need to do?


Look on the issue tracker for a bug that you can fix.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HEA7LYZLM5Q6KSURG2PG7PBNKOR37RM7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
Pip vendors requests for network calls:
https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests

But still does depend on functions from urllib.parse and urllib.request in
many places:
https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py

Damian (he/him)

On Sun, Feb 6, 2022 at 9:36 AM Dong-hee Na  wrote:

> I am not an expert about pip,
> but it will be not a problem about installing the pip module once CPython
> removes urllib module from stdlib?
>
> Warm regards,
> Dong-hee
>
> 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성:
>
>> Hi,
>>
>> I propose to deprecate the urllib module in Python 3.11. It would emit
>> a DeprecationWarning which warn users, so users should consider better
>> alternatives like urllib3 or httpx: well known modules, better
>> maintained, more secure, support HTTP/2 (httpx), etc.
>>
>> I don't propose to schedule its removal. Let's discuss the removal in
>> 1 or 2 years.
>>
>> --
>>
>> urllib has many abstraction to support a wide range of protocols with
>> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
>> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
>> Authentication requires 10-20 lines of code, whereas it should be a
>> single line.
>>
>> Users (me included) don't like urllib API which was too complicated
>> for common tasks.
>>
>> --
>>
>> Unhappy users created multiple better alternatives to the stdlib urllib
>> module.
>>
>> In 2008, the "urllib3" module was created to provide an API designed
>> to be as simple as possible for the most common HTTP and HTTPS
>> requests. Example:
>>
>>req = http.request('GET', 'http://httpbin.org/robots.txt').
>>
>> In 2011, the "requests" module based on urllib3 was created.
>>
>> In 2013, the "aiohttp" module based on asyncio was created.
>>
>> In 2015, new "httpx" module was created:
>>
>> req = httpx.get('https://www.example.org/')
>>
>> Not only httpx has a regular "synchronous" API (blocking function
>> calls), but it also has an asynchronous API!
>>
>> Sadly, while HTTP/3 is being developed, it seems like in this list,
>> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>>
>> For HTTP/2, I also found the "httplib2" module.
>>
>> For HTTP/3, I found the "http3" and "aioquic" modules.
>>
>> --
>>
>> Let's come back to urllib:
>>
>> * It's API is too complicated
>> * It doesn't support HTTP/2 nor HTTP/3
>> * It's barely maintained: there are 121 open issues including 3 security
>> issues!
>>
>> The 3 open security issues:
>>
>> * bpo-33661 open 2018;
>> * bpo-36338 open in 2019;
>> * bpo-45795 open in 2021.
>>
>> Usually, it's bad when you refer to an open security issue by its
>> creation year :-(
>>
>> The urllib module has long history of security vulnerabilities. List
>> of *fixed* vulnerabilities:
>>
>> * 2011 (bpo-11662):
>> https://python-security.readthedocs.io/vuln/urllib-redirect.html
>> * 2017 (bpo-30119):
>>
>> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
>> * 2017 (bpo-30500):
>>
>> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
>> * 2019 (bpo-35907):
>> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
>> * 2019 (bpo-38826):
>> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
>> * 2021 (bpo-42967):
>>
>> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
>> * 2021 (bpo-43075):
>> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
>> * 2021 (bpo-44022):
>> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>>
>> urllib is a package made of 4 parts:
>>
>> * urllib.request for opening and reading URLs
>> * urllib.error containing the exceptions raised by urllib.request
>> * urllib.parse for parsing URLs
>> * urllib.robotparser for parsing robots.txt files
>>
>> I propose to deprecate all of them. Maybe the deprecation can be
>> different for each sub-module?
>>
>> Victor
>> --
>> Night gathers, and now my watch begins. It shall not end until my death.
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- 

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
Speaking from anecdotal experience, "urllib.parse" is a very popular and
highly depended on module, I would be shocked if removing it wouldn't be
very disruptive.

In fact a quick search of the replacement modules you mention see that they
all rely it on it, here is an example from each:
* requests:
https://github.com/psf/requests/blob/99b3b492418d0751ca960178d274f89805095e4c/requests/sessions.py#L121
* aiohttp:
https://github.com/aio-libs/aiohttp/blob/7d78fd01dbe983d119141d7f2775aefd42494f99/aiohttp/formdata.py#L129
* httpx:
https://github.com/encode/httpx/blob/b7dc0c3df68279ce89f016a69a41b27a2346d54d/httpx/_content.py#L144

As for "urllib.request" I know that the philosophy of Python being a
"batteries included language" is going away, but having no way to make any
http call without importing Python definitely has a lot of situations where
it makes Python more difficult to use. Could it not always emit a warning
that this library should not be used in a production environment? Much in
the same way that Flask's default web server does.

Damian (he/hm)


On Sun, Feb 6, 2022 at 9:16 AM Victor Stinner  wrote:

> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
>
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
>
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
Strong -1 from me.

urllib.request may not be "best practice", but it's still extremely
useful for simple situations, and urllib.parse is useful for basic
handling of URLs.Yes, the more complex aspects of urllib are better
handled by external packages, but that's not sufficient argument for
removing the package altogether. There are many situations where
external dependencies are unsuitable. Also, there's quite a lot of
usage of urllib in the stdlib itself - how would you propose to
replace that?

In addition, pip relies pretty heavily on urllib (parse and request),
and pip has a bootstrapping issue, so using 3rd party libraries is
non-trivial. Also, of pip's existing vendored dependencies,
webencodings, urllib3, requests, pkg_resources, packaging, html5lib,
distlib and cachecontrol all import urllib. So this would be *hugely*
disruptive to the whole packaging ecosystem (which is under-resourced
at the best of times, so this would put a lot of strain on us).

In any case, why is this being proposed as a simple posting on
python-dev? There's already PEP 594 for removals from the stdlib. If
you have a case for removing urllib, I suggest you get it added to PEP
594, so it can be discussed and agreed properly, along with the other
removals (none of which is remotely as controversial as urllib, so
there's absolutely no doubt in my mind that this would need a PEP
however it was proposed).

Paul

On Sun, 6 Feb 2022 at 14:15, Victor Stinner  wrote:
>
> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib 
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security 
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe 

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Dong-hee Na
I am not an expert about pip,
but it will be not a problem about installing the pip module once CPython
removes urllib module from stdlib?

Warm regards,
Dong-hee

2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성:

> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
>
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
>
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] I want to contribute to Python.

2022-02-06 Thread Ezekiel Adetoro
Hello,
My name is Ezekiel, and it is my desire to start contributing to Python, be 
part of the core development of Python. I have forked the CPython and cloned 
it. What is the next step I need to do?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NI7AVGSVM2ZMATCH5GFIHQS65D43YQ47/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Victor Stinner
Hi,

I propose to deprecate the urllib module in Python 3.11. It would emit
a DeprecationWarning which warn users, so users should consider better
alternatives like urllib3 or httpx: well known modules, better
maintained, more secure, support HTTP/2 (httpx), etc.

I don't propose to schedule its removal. Let's discuss the removal in
1 or 2 years.

--

urllib has many abstraction to support a wide range of protocols with
"handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
authentication, HTTP Cookie, etc. A simple HTTP request using Basic
Authentication requires 10-20 lines of code, whereas it should be a
single line.

Users (me included) don't like urllib API which was too complicated
for common tasks.

--

Unhappy users created multiple better alternatives to the stdlib urllib module.

In 2008, the "urllib3" module was created to provide an API designed
to be as simple as possible for the most common HTTP and HTTPS
requests. Example:

   req = http.request('GET', 'http://httpbin.org/robots.txt').

In 2011, the "requests" module based on urllib3 was created.

In 2013, the "aiohttp" module based on asyncio was created.

In 2015, new "httpx" module was created:

req = httpx.get('https://www.example.org/')

Not only httpx has a regular "synchronous" API (blocking function
calls), but it also has an asynchronous API!

Sadly, while HTTP/3 is being developed, it seems like in this list,
httpx is the only HTTP client library supporting HTTP/2 currently :-(

For HTTP/2, I also found the "httplib2" module.

For HTTP/3, I found the "http3" and "aioquic" modules.

--

Let's come back to urllib:

* It's API is too complicated
* It doesn't support HTTP/2 nor HTTP/3
* It's barely maintained: there are 121 open issues including 3 security issues!

The 3 open security issues:

* bpo-33661 open 2018;
* bpo-36338 open in 2019;
* bpo-45795 open in 2021.

Usually, it's bad when you refer to an open security issue by its
creation year :-(

The urllib module has long history of security vulnerabilities. List
of *fixed* vulnerabilities:

* 2011 (bpo-11662):
https://python-security.readthedocs.io/vuln/urllib-redirect.html
* 2017 (bpo-30119):
https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
* 2017 (bpo-30500):
https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
* 2019 (bpo-35907):
https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
* 2019 (bpo-38826):
https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
* 2021 (bpo-42967):
https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
* 2021 (bpo-43075):
https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
* 2021 (bpo-44022):
https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html

urllib is a package made of 4 parts:

* urllib.request for opening and reading URLs
* urllib.error containing the exceptions raised by urllib.request
* urllib.parse for parsing URLs
* urllib.robotparser for parsing robots.txt files

I propose to deprecate all of them. Maybe the deprecation can be
different for each sub-module?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
Code of Conduct: http://python.org/psf/codeofconduct/