New submission from Serhiy Storchaka:
sre_parse.parse_template uses string concatenation to accumulate characters.
def literal(literal, p=p, pappend=a):
if p and p[-1][0] is LITERAL:
p[-1] = LITERAL, p[-1][1] + literal
else:
pappend((LITERAL, literal))
This operation have quadratic calculation complexity for long replacement
strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl =
'x'*100000" "parse_template(repl, '')"
1 loops, best of 1: 3.38 sec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl =
'x'*200000" "parse_template(repl, '')"
1 loops, best of 1: 18.2 sec per loop
The proposed patch change amortized complexity to be linear. It also speeds up
parsing shorter strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl =
'x'*100000" "parse_template(repl, '')"
1 loops, best of 1: 915 msec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl =
'x'*200000" "parse_template(repl, '')"
1 loops, best of 1: 1.79 sec per loop
----------
components: Library (Lib), Regular Expressions
files: re_parse_template.patch
keywords: patch
messages: 201032
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Quadratic complexity in the parsing of re replacement string
type: performance
versions: Python 2.7, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file32315/re_parse_template.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19365>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com