Ezio Melotti added the comment:

I tried to remove a few unused regex and inline some of the others (the re 
module has its own caching anyway and they don't seem to be documented), but it 
didn't get so much faster (see attached patch).  

I then put the second list of email imports of the previous message in a file 
and run it with cprofile and these are the results:

=== Without patch ===

$ time ./python -m issue11454_imp2
[69308 refs]

real    0m0.337s
user    0m0.312s
sys     0m0.020s

$ ./python -m cProfile -s time issue11454_imp2.py
         15130 function calls (14543 primitive calls) in 0.191 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       26    0.029    0.001    0.029    0.001 {built-in method loads}
     1248    0.015    0.000    0.018    0.000 sre_parse.py:184(__next)
        3    0.010    0.003    0.015    0.005 
sre_compile.py:301(_optimize_unicode)
    48/17    0.009    0.000    0.037    0.002 sre_parse.py:418(_parse)
     30/1    0.008    0.000    0.191    0.191 {built-in method exec}
       82    0.007    0.000    0.024    0.000 {built-in method __build_class__}
       25    0.006    0.000    0.024    0.001 
sre_compile.py:207(_optimize_charset)
        8    0.005    0.001    0.005    0.001 {built-in method load_dynamic}
     1122    0.005    0.000    0.022    0.000 sre_parse.py:209(get)
      177    0.005    0.000    0.005    0.000 {built-in method stat}
      107    0.005    0.000    0.012    0.000 <frozen 
importlib._bootstrap>:1350(find_loader)
2944/2919    0.004    0.000    0.004    0.000 {built-in method len}
    69/15    0.003    0.000    0.028    0.002 sre_compile.py:32(_compile)
        9    0.003    0.000    0.003    0.000 sre_compile.py:258(_mk_bitmap)
       94    0.002    0.000    0.003    0.000 <frozen 
importlib._bootstrap>:74(_path_join)


=== With patch ===

$ time ./python -m issue11454_imp2
[69117 refs]

real    0m0.319s
user    0m0.304s
sys     0m0.012s

$ ./python -m cProfile -s time issue11454_imp2.py
         11281 function calls (10762 primitive calls) in 0.162 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       21    0.022    0.001    0.022    0.001 {built-in method loads}
        3    0.011    0.004    0.015    0.005 
sre_compile.py:301(_optimize_unicode)
      708    0.008    0.000    0.010    0.000 sre_parse.py:184(__next)
     30/1    0.008    0.000    0.238    0.238 {built-in method exec}
       82    0.007    0.000    0.023    0.000 {built-in method __build_class__}
      187    0.005    0.000    0.005    0.000 {built-in method stat}
        8    0.005    0.001    0.005    0.001 {built-in method load_dynamic}
      107    0.005    0.000    0.012    0.000 <frozen 
importlib._bootstrap>:1350(find_loader)
     29/8    0.005    0.000    0.020    0.002 sre_parse.py:418(_parse)
       11    0.004    0.000    0.020    0.002 
sre_compile.py:207(_optimize_charset)
      643    0.003    0.000    0.012    0.000 sre_parse.py:209(get)
        5    0.003    0.001    0.003    0.001 {built-in method dumps}
       94    0.002    0.000    0.003    0.000 <frozen 
importlib._bootstrap>:74(_path_join)
      257    0.002    0.000    0.002    0.000 quoprimime.py:56(<genexpr>)
       26    0.002    0.000    0.116    0.004 <frozen 
importlib._bootstrap>:938(get_code)
1689/1676    0.002    0.000    0.002    0.000 {built-in method len}
       31    0.002    0.000    0.003    0.000 <frozen 
importlib._bootstrap>:1034(get_data)
      256    0.002    0.000    0.002    0.000 {method 'setdefault' of 'dict' 
objects}
      119    0.002    0.000    0.003    0.000 <frozen 
importlib._bootstrap>:86(_path_split)
       35    0.002    0.000    0.019    0.001 <frozen 
importlib._bootstrap>:1468(_find_module)
       34    0.002    0.000    0.015    0.000 <frozen 
importlib._bootstrap>:1278(_get_loader)
     39/6    0.002    0.000    0.023    0.004 sre_compile.py:32(_compile)
     26/3    0.001    0.000    0.235    0.078 <frozen 
importlib._bootstrap>:853(_load_module)


The time spent in sre_compile.py:301(_optimize_unicode) most likely comes from 
email.utils._has_surrogates (there's a further speedup when it's commented 
away):
    _has_surrogates = 
re.compile('([^\ud800-\udbff]|\A)[\udc00-\udfff]([^\udc00-\udfff]|\Z)').search

This is used in a number of places, so it can't be inlined.  I wanted to 
optimize it but I'm not sure what it's supposed to do.  It matches lone low 
surrogates, but not lone high ones, and matches some invalid sequences, but not 
others:
>>> _has_surrogates('\ud800')  # lone high
>>> _has_surrogates('\udc00')  # lone low
<_sre.SRE_Match object at 0x9ae00e8>
>>> _has_surrogates('\ud800\udc00')  # valid pair (high+low)
>>> _has_surrogates('\ud800\ud800\udc00')  # invalid sequence (lone high, valid 
>>> high+low)
>>> _has_surrogates('\udc00\ud800\ud800\udc00')  # invalid sequence (lone low, 
>>> lone high, valid high+low)
<_sre.SRE_Match object at 0x9ae0028>

FWIW this was introduced in email.message in 1a041f364916 and then moved to 
email.util in 9388c671d52d.

----------
keywords: +patch
nosy: +ezio.melotti
Added file: http://bugs.python.org/file27201/issue11454.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11454>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to