[issue46065] re.findall takes forever and never ends

Gareth Rees Sun, 19 Dec 2021 08:10:58 -0800


Gareth Rees <[email protected]> added the comment:


The way to avoid this behaviour is to disallow the attempts at matching that 
you know are going to fail. As Serhiy described above, if the search fails 
starting at the first character of the string, it will move forward and try 
again starting at the second character. But you know that this new attempt must 
fail, so you can force the regular expression engine to discard the attempt 
immediately.

Here's an illustration in a simpler setting, where we are looking for all 
strings of 'a' followed by 'b':

    >>> import re
    >>> from timeit import timeit
    >>> text = 'a' * 100000
    >>> timeit(lambda:re.findall(r'a+b', text), number=1)
    6.643531181000014

We know that any successful match must be preceded by a character other than 
'a' (or the beginning of the string), so we can reject many unsuccessful 
matches like this:

    >>> timeit(lambda:re.findall(r'(?:^|[^a])(a+b)', text), number=1)
    0.003743481000014981

In your case, a successful match must be preceded by [^a-zA-Z0-9_.+-] (or the 
beginning of the string).

----------
nosy: [email protected]

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue46065>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46065] re.findall takes forever and never ends

Reply via email to