On 06/02/2022 15.08, Victor Stinner wrote:
Hi,

I propose to deprecate the urllib module in Python 3.11. It would emit
a DeprecationWarning which warn users, so users should consider better
alternatives like urllib3 or httpx: well known modules, better
maintained, more secure, support HTTP/2 (httpx), etc.

I don't propose to schedule its removal. Let's discuss the removal in
1 or 2 years.

--

urllib has many abstraction to support a wide range of protocols with
"handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
authentication, HTTP Cookie, etc. A simple HTTP request using Basic
Authentication requires 10-20 lines of code, whereas it should be a
single line.

Users (me included) don't like urllib API which was too complicated
for common tasks.

--


[...]


urllib is a package made of 4 parts:

* urllib.request for opening and reading URLs
* urllib.error containing the exceptions raised by urllib.request
* urllib.parse for parsing URLs
* urllib.robotparser for parsing robots.txt files

I propose to deprecate all of them. Maybe the deprecation can be
different for each sub-module?

Thanks for bringing this topic forward, Victor!

Disclaimer: I proposed the removal of urllib today in Python core's internal chat.

The urllib package -- and to some degree also the http package -- are constant source of security bugs. The code is old and the parsers for HTTP and URLs don't handle edge cases well. Python core lacks a true maintainer of the code. To be honest, we have to admit defeat and be up front that urllib is not up to the task for this decade. It was designed written during a more friendly, less scary time on the internet.

If I had the power and time, then I would replace urllib with a simpler, reduced HTTP client that uses platform's HTTP library under the hood (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or aiohttp are much better suited than urllib.

The second best option is to reduce the feature set of urllib to core HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, more standard conform parsers for urls, query strings, and RFC 2822 instead of RFC 822 for headers.

Christian


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WYVETVHMGRS4CI47GTFY6W7B43YLSJH2/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to