Tim Peters <[email protected]> added the comment:
Without re.IGNORECASE, the leading
^[_a-z0-9-]+
can't even match the first character (`val` starts with uppercase Z), so it
fails instantly.
With re.IGNORECASE, it's not "stuck", but is taking a verrrrrry long time to
try an enormous number of (ultimately doomed) possibilities due to the way the
regexp is written.
This is due to using nested quantifiers for no apparent reason. For any
character class C,
(C+)*
matches the same set of strings as
C*
but the former way can _try_ to match in an exponential (in the length of the
string) number of ways. So replace
([\.'_a-z0-9-]+)*
and
([\.a-z0-9-]+)*
with
[\.'_a-z0-9-]*
and
[\.a-z0-9-]*
and it fails to match `val` quickly (even with re.IGNORECASE).
For more on this (which applies to many regexp implementations, not just
Python's), here's a start:
https://www.mathworks.com/matlabcentral/answers/95953-why-can-nested-quantifiers-in-regexp-can-cause-inefficient-failures-in-matlab-6-5-r13
The "Mastering Regular Expressions" book referenced in that answer is an
excellent book-length treatment of this (and related) topic(s).
----------
nosy: +tim.peters
resolution: -> not a bug
stage: -> resolved
status: open -> closed
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35932>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com