Ezio Melotti added the comment: I tried to remove a few unused regex and inline some of the others (the re module has its own caching anyway and they don't seem to be documented), but it didn't get so much faster (see attached patch).
I then put the second list of email imports of the previous message in a file and run it with cprofile and these are the results: === Without patch === $ time ./python -m issue11454_imp2 [69308 refs] real 0m0.337s user 0m0.312s sys 0m0.020s $ ./python -m cProfile -s time issue11454_imp2.py 15130 function calls (14543 primitive calls) in 0.191 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 26 0.029 0.001 0.029 0.001 {built-in method loads} 1248 0.015 0.000 0.018 0.000 sre_parse.py:184(__next) 3 0.010 0.003 0.015 0.005 sre_compile.py:301(_optimize_unicode) 48/17 0.009 0.000 0.037 0.002 sre_parse.py:418(_parse) 30/1 0.008 0.000 0.191 0.191 {built-in method exec} 82 0.007 0.000 0.024 0.000 {built-in method __build_class__} 25 0.006 0.000 0.024 0.001 sre_compile.py:207(_optimize_charset) 8 0.005 0.001 0.005 0.001 {built-in method load_dynamic} 1122 0.005 0.000 0.022 0.000 sre_parse.py:209(get) 177 0.005 0.000 0.005 0.000 {built-in method stat} 107 0.005 0.000 0.012 0.000 <frozen importlib._bootstrap>:1350(find_loader) 2944/2919 0.004 0.000 0.004 0.000 {built-in method len} 69/15 0.003 0.000 0.028 0.002 sre_compile.py:32(_compile) 9 0.003 0.000 0.003 0.000 sre_compile.py:258(_mk_bitmap) 94 0.002 0.000 0.003 0.000 <frozen importlib._bootstrap>:74(_path_join) === With patch === $ time ./python -m issue11454_imp2 [69117 refs] real 0m0.319s user 0m0.304s sys 0m0.012s $ ./python -m cProfile -s time issue11454_imp2.py 11281 function calls (10762 primitive calls) in 0.162 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 21 0.022 0.001 0.022 0.001 {built-in method loads} 3 0.011 0.004 0.015 0.005 sre_compile.py:301(_optimize_unicode) 708 0.008 0.000 0.010 0.000 sre_parse.py:184(__next) 30/1 0.008 0.000 0.238 0.238 {built-in method exec} 82 0.007 0.000 0.023 0.000 {built-in method __build_class__} 187 0.005 0.000 0.005 0.000 {built-in method stat} 8 0.005 0.001 0.005 0.001 {built-in method load_dynamic} 107 0.005 0.000 0.012 0.000 <frozen importlib._bootstrap>:1350(find_loader) 29/8 0.005 0.000 0.020 0.002 sre_parse.py:418(_parse) 11 0.004 0.000 0.020 0.002 sre_compile.py:207(_optimize_charset) 643 0.003 0.000 0.012 0.000 sre_parse.py:209(get) 5 0.003 0.001 0.003 0.001 {built-in method dumps} 94 0.002 0.000 0.003 0.000 <frozen importlib._bootstrap>:74(_path_join) 257 0.002 0.000 0.002 0.000 quoprimime.py:56(<genexpr>) 26 0.002 0.000 0.116 0.004 <frozen importlib._bootstrap>:938(get_code) 1689/1676 0.002 0.000 0.002 0.000 {built-in method len} 31 0.002 0.000 0.003 0.000 <frozen importlib._bootstrap>:1034(get_data) 256 0.002 0.000 0.002 0.000 {method 'setdefault' of 'dict' objects} 119 0.002 0.000 0.003 0.000 <frozen importlib._bootstrap>:86(_path_split) 35 0.002 0.000 0.019 0.001 <frozen importlib._bootstrap>:1468(_find_module) 34 0.002 0.000 0.015 0.000 <frozen importlib._bootstrap>:1278(_get_loader) 39/6 0.002 0.000 0.023 0.004 sre_compile.py:32(_compile) 26/3 0.001 0.000 0.235 0.078 <frozen importlib._bootstrap>:853(_load_module) The time spent in sre_compile.py:301(_optimize_unicode) most likely comes from email.utils._has_surrogates (there's a further speedup when it's commented away): _has_surrogates = re.compile('([^\ud800-\udbff]|\A)[\udc00-\udfff]([^\udc00-\udfff]|\Z)').search This is used in a number of places, so it can't be inlined. I wanted to optimize it but I'm not sure what it's supposed to do. It matches lone low surrogates, but not lone high ones, and matches some invalid sequences, but not others: >>> _has_surrogates('\ud800') # lone high >>> _has_surrogates('\udc00') # lone low <_sre.SRE_Match object at 0x9ae00e8> >>> _has_surrogates('\ud800\udc00') # valid pair (high+low) >>> _has_surrogates('\ud800\ud800\udc00') # invalid sequence (lone high, valid >>> high+low) >>> _has_surrogates('\udc00\ud800\ud800\udc00') # invalid sequence (lone low, >>> lone high, valid high+low) <_sre.SRE_Match object at 0x9ae0028> FWIW this was introduced in email.message in 1a041f364916 and then moved to email.util in 9388c671d52d. ---------- keywords: +patch nosy: +ezio.melotti Added file: http://bugs.python.org/file27201/issue11454.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11454> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com