[Python-Dev] Investigating time for `import requests`

INADA Naoki Sun, 01 Oct 2017 19:07:59 -0700

See also https://github.com/requests/requests/issues/4315


I tried new `-X importtime` option to `import requests`.
Full output is here:
https://gist.github.com/methane/96d58a29e57e5be97769897462ee1c7e

Currently, it took about 110ms.  And major parts are from Python stdlib.
Followings are root of slow stdlib subtrees.

import time: self [us] | cumulative | imported package
import time:      1374 |      14038 |       logging
import time:      2636 |       4255 |       socket
import time:      2902 |      11004 |                   ssl
import time:      1162 |      16694 |           http.client
import time:       656 |       5331 |     cgi
import time:      7338 |       7867 |         http.cookiejar
import time:      2930 |       2930 |         http.cookies


*1. logging*

logging is slow because it is imported in early stage.
It imports many common, relatively slow packages. (collections, functools,
enum, re).

Especially, traceback module is slow because linecache.

import time:      1419 |       5016 |             tokenize
import time:       200 |       5910 |           linecache
import time:       347 |       8869 |         traceback

I think it's worth enough to import linecache lazily.

*2. socket*

import time:       807 |       1221 |         selectors
import time:      2636 |       4255 |       socket

socket imports selectors for socket.send_file(). And selectors module use
ABC.
That's why selectors is bit slow.

And socket module creates four enums.  That's why import socket took more
than 2.5ms
excluding subimports.

*3. ssl*

import time:      2007 |       2007 |                     ipaddress
import time:      2386 |       2386 |                     textwrap
import time:      2723 |       2723 |                     _ssl
...
import time:       306 |        988 |                     base64
import time:      2902 |      11004 |                   ssl

I already created pull request about removing textwrap dependency from ssl.
https://github.com/python/cpython/pull/3849

ipaddress and _ssl module are bit slow too.  But I don't know we can
improve them or not.

ssl itself took 2.9 ms.  It's because ssl has six enums.


*4. http.client*

import time:      1376 |       2448 |                   email.header
...
import time:      1469 |       7791 |                   email.utils
import time:       408 |      10646 |                 email._policybase
import time:       939 |      12210 |               email.feedparser
import time:       322 |      12720 |             email.parser
...
import time:       599 |       1361 |             email.message
import time:      1162 |      16694 |           http.client

email.parser has very large import tree.
But I don't know how to break the tree.

*5. cgi*

import time:      1083 |       1083 |         html.entities
import time:       560 |       1643 |       html
...
import time:       656 |       2609 |         shutil
import time:       424 |       3033 |       tempfile
import time:       656 |       5331 |     cgi

cgi module uses tempfile to save uploaded file.
But requests imports cgi just for `cgi.parse_header()`.
tempfile is not used.  Maybe, it's worth enough to import it lazily.

FYI, cgi depends on very slow email.parser too.
But this tree doesn't contain it because http.client is imported before cgi.
Even though it's not problem for requests, it may affects to real CGI
application.
Of course, startup time is very important for CGI applications too.


*6. http.cookiejar and http.cookies*

It's slow because it has many `re.compile()`


*Ideas*

There are some places to break large import tree by "import in function"
hack.

ABC is slow, and it's used widely without almost no real need.  (Who need
selectors is ABC?)
We can't remove ABC dependency because of backward compatibility.
But I hope ABC is implemented in C by Python 3.7.

Enum is slow, maybe slower than most people think.
I don't know why exactly, but I suspect that it's because namespace dict
implemented in Python.

Anyway, I think we can have C implementation of IntEnum and IntFlag, like
namedtpule vs PyStructSequence.
It doesn't need to 100% compatible with current enum.  Especially, no need
for using metaclass.

Another major slowness comes from compiling regular expression.
I think we can increase cache size of `re.compile` and use ondemand cached
compiling (e.g. `re.match()`),
instead of "compile at import time" in many modules.

PEP 562 -- Module __getattr__ helps a lot too.
It make possible to split collection module and strings module.
(strings module is used often for constants like strings.ascii_letters, but
strings.Template
cause import time re.compile())


Regards,
-- 
Inada Naoki <songofaca...@gmail.com>

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Investigating time for `import requests`

Reply via email to