Before deferring re.compile, can we make it faster? I profiled `import string` and small optimization can make it 2x faster! (but it's not backward compatible)
Before optimize: import time: self [us] | cumulative | imported package import time: 2339 | 9623 | string string module took about 2.3 ms to import. I found: * RegexFlag.__and__ and __new__ is called very often. * _optimize_charset is slow, because re.UNICODE | re.IGNORECASE diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py index 144620c6d1..7c662247d4 100644 --- a/Lib/sre_compile.py +++ b/Lib/sre_compile.py @@ -582,7 +582,7 @@ def isstring(obj): def _code(p, flags): - flags = p.pattern.flags | flags + flags = int(p.pattern.flags) | int(flags) code = [] # compile info block diff --git a/Lib/string.py b/Lib/string.py index b46e60c38f..fedd92246d 100644 --- a/Lib/string.py +++ b/Lib/string.py @@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass): delimiter = '$' idpattern = r'[_a-z][_a-z0-9]*' braceidpattern = None - flags = _re.IGNORECASE + flags = _re.IGNORECASE | _re.ASCII def __init__(self, template): self.template = template patched: import time: 1191 | 8479 | string Of course, this patch is not backward compatible. [a-z] doesn't match with 'ı' or 'ſ' anymore. But who cares? (in sre_compile.py) # LATIN SMALL LETTER I, LATIN SMALL LETTER DOTLESS I (0x69, 0x131), # iı # LATIN SMALL LETTER S, LATIN SMALL LETTER LONG S (0x73, 0x17f), # sſ There are some other `re.I(GNORECASE)` options in stdlib. I'll check them. More optimization can be done with implementing sre_parse and sre_compile in C. But I have no time for it in this year. Regards, -- Inada Naoki <songofaca...@gmail.com>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com