New submission from Barry A. Warsaw:
It's a very common pattern to see the following at module scope:
cre_a = re.compile('some pattern')
cre_b = re.compile('other pattern')
and so on. This can cost you at start up time because all those regular
expressions are compiled at import time, even if they're never used in practice
(e.g. because say whatever condition tickles the compiled regex never gets
exercised).
It occurred to me that if re.compile() deferred compilation of the regexp until
first use, you could speed up start up time. But by how much? And at what
cost?
So I ran a small experiment (pull request to be submitted) using the `perf`
module on `pip --help`. I was able to cut down the number of compiles from 28
to 9, and a mean startup time from 245ms to 213ms.
% python -m perf compare_to ../base.json ../defer.json
Mean +- std dev: [base] 245 ms +- 19 ms -> [defer] 213 ms +- 21 ms: 1.15x
faster (-13%)
`pip install tox` reduces the compiles from 231 to 75:
(cpython 3.7) 231 0.06945133209228516
(3.7 w/defer) 75 0.03140091896057129
So what's the cost? Backward compatibility. `re.compile()` doesn't return a
compiled regular expression object now, but instead a "deferred" proxy. When
the proxy is used, then it does the actual compilation. This can break
compatibility by deferring any exceptions that compile() might raise. This
happens a fair bit in the test suite, but I'm not sure it's all that common in
practice. In any case, I've also added a re.IMMEDIATE (re.N -- for "now") flag
to force immediate compilation.
I also modified the compilation to use an actual functools.lru_cache. This
way, if maxcache gets triggered, the entire cache won't get blown away.
So, whether this is a good idea or not, I open this and push the branch for
further discussion.
----------
assignee: barry
components: Library (Lib)
messages: 302995
nosy: barry
priority: normal
severity: normal
status: open
title: Defer compiling regular expressions
type: performance
versions: Python 3.7
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue31580>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com