Tim Peters <[email protected]> added the comment:
Well, the problem in the regexp is this part: "\d+,? ?". You're not
_requiring_ that strings of digits be separated by a comma or blank, you're
only _allowing_ them to be so separated. A solid string of digits is matched
by this, and so the enclosing + requires the engine, when backtracking, to
consider every possible way of breaking the solid string of digits into one or
more solid strings of digits too. If there are n digits in the digit string,
there are 2**(n-1) ways to do so.
Overall the regexp can never match because "NULL" never gets matched. So all
possibilities will be tried. We can compute how many attempts will be made
like so:
prod = 1
for c in re.findall("\d+|NULL", statement):
if c == "NULL":
break
prod *= 2**(len(c) - 1)
print(format(prod, ","))
Attempting test_01
256
Attempting test_02
4,096
Attempting test_03
65,536
Attempting test_04
1,048,576
Attempting test_05
2,097,152
Attempting test_06
4,194,304
Attempting test_07
8,388,608
Attempting test_08
16,777,216
Attempting test_09
33,554,432
Attempting test_10
67,108,864
Attempting test_11
16,777,216
Attempting test_12
536,870,912
Attempting test_13
17,179,869,184
Attempting test_14
549,755,813,888
Note, e.g., this "predicts" test_08 will take about the same time as test_11
(both require 16,777,216 attempts). Which is what you actually saw happen.
And that's why you gave up on test_14: it will require over half a trillion
failing attempts before it ends, which will take roughly 30 times longer than
it failed to match test_13.
If we were building a regexp _tester_, we'd spin off a new process to run the
regexp, and kill the process if it took "too long". But that's not we're doing
- library functions do what you tell them to do ;-)
In this case, it's easily repaired. For example, replace
"\d+,? ?"
by
"\d+(?=[ ,)]),? ?"
This uses a lookahead assertion to _require_ that a digit string is followed by
a blank, comma, or right parenthesis. Which prevents exponential-time
backtracking in failing cases.
----------
nosy: +tim.peters
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue31759>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com