Hi,

I propose to deprecate the urllib module in Python 3.11. It would emit
a DeprecationWarning which warn users, so users should consider better
alternatives like urllib3 or httpx: well known modules, better
maintained, more secure, support HTTP/2 (httpx), etc.

I don't propose to schedule its removal. Let's discuss the removal in
1 or 2 years.

--

urllib has many abstraction to support a wide range of protocols with
"handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
authentication, HTTP Cookie, etc. A simple HTTP request using Basic
Authentication requires 10-20 lines of code, whereas it should be a
single line.

Users (me included) don't like urllib API which was too complicated
for common tasks.

--

Unhappy users created multiple better alternatives to the stdlib urllib module.

In 2008, the "urllib3" module was created to provide an API designed
to be as simple as possible for the most common HTTP and HTTPS
requests. Example:

   req = http.request('GET', 'http://httpbin.org/robots.txt').

In 2011, the "requests" module based on urllib3 was created.

In 2013, the "aiohttp" module based on asyncio was created.

In 2015, new "httpx" module was created:

    req = httpx.get('https://www.example.org/')

Not only httpx has a regular "synchronous" API (blocking function
calls), but it also has an asynchronous API!

Sadly, while HTTP/3 is being developed, it seems like in this list,
httpx is the only HTTP client library supporting HTTP/2 currently :-(

For HTTP/2, I also found the "httplib2" module.

For HTTP/3, I found the "http3" and "aioquic" modules.

--

Let's come back to urllib:

* It's API is too complicated
* It doesn't support HTTP/2 nor HTTP/3
* It's barely maintained: there are 121 open issues including 3 security issues!

The 3 open security issues:

* bpo-33661 open 2018;
* bpo-36338 open in 2019;
* bpo-45795 open in 2021.

Usually, it's bad when you refer to an open security issue by its
creation year :-(

The urllib module has long history of security vulnerabilities. List
of *fixed* vulnerabilities:

* 2011 (bpo-11662):
https://python-security.readthedocs.io/vuln/urllib-redirect.html
* 2017 (bpo-30119):
https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
* 2017 (bpo-30500):
https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
* 2019 (bpo-35907):
https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
* 2019 (bpo-38826):
https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
* 2021 (bpo-42967):
https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
* 2021 (bpo-43075):
https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
* 2021 (bpo-44022):
https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html

urllib is a package made of 4 parts:

* urllib.request for opening and reading URLs
* urllib.error containing the exceptions raised by urllib.request
* urllib.parse for parsing URLs
* urllib.robotparser for parsing robots.txt files

I propose to deprecate all of them. Maybe the deprecation can be
different for each sub-module?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to