Tim Peters <t...@python.org> added the comment:

Well, the problem in the regexp is this part:  "\d+,? ?".  You're not 
_requiring_ that strings of digits be separated by a comma or blank, you're 
only _allowing_ them to be so separated.  A solid string of digits is matched 
by this, and so the enclosing + requires the engine, when backtracking, to 
consider every possible way of breaking the solid string of digits into one or 
more solid strings of digits too.  If there are n digits in the digit string, 
there are 2**(n-1) ways to do so.

Overall the regexp can never match because "NULL" never gets matched.  So all 
possibilities will be tried.  We can compute how many attempts will be made 
like so:

    prod = 1
    for c in re.findall("\d+|NULL", statement):
        if c == "NULL":
            break
        prod *= 2**(len(c) - 1)
    print(format(prod, ","))


Attempting test_01
256
Attempting test_02
4,096
Attempting test_03
65,536
Attempting test_04
1,048,576
Attempting test_05
2,097,152
Attempting test_06
4,194,304
Attempting test_07
8,388,608
Attempting test_08
16,777,216
Attempting test_09
33,554,432
Attempting test_10
67,108,864
Attempting test_11
16,777,216
Attempting test_12
536,870,912
Attempting test_13
17,179,869,184
Attempting test_14
549,755,813,888

Note, e.g., this "predicts" test_08 will take about the same time as test_11 
(both require 16,777,216 attempts).  Which is what you actually saw happen.

And that's why you gave up on test_14:  it will require over half a trillion 
failing attempts before it ends, which will take roughly 30 times longer than 
it failed to match test_13.

If we were building a regexp _tester_, we'd spin off a new process to run the 
regexp, and kill the process if it took "too long".  But that's not we're doing 
- library functions do what you tell them to do ;-)

In this case, it's easily repaired.  For example, replace

    "\d+,? ?"

 by 

    "\d+(?=[ ,)]),? ?"

This uses a lookahead assertion to _require_ that a digit string is followed by 
a blank, comma, or right parenthesis.  Which prevents exponential-time 
backtracking in failing cases.

----------
nosy: +tim.peters

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31759>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to