[issue40879] Strange regex cycle

Tim Peters Fri, 05 Jun 2020 18:22:19 -0700


Tim Peters <t...@python.org> added the comment:


The repr truncates the pattern string, for display, if it's "too long". The 
only visual clue about that, though, is that the display is missing the pattern 
string's closing quote, as in the output you showed here. If you look at 
url_pat.pattern, though, you'll see that nothing has been lost.

I'm not sure why it does that.  As I vaguely recall, some years ago there was a 
crusade to limit maximum repr sizes because long output was considered to be "a 
security issue" (e.g., DoS attacks vis tricking logging/auditing facilities 
into writing giant strings when recording reprs).

In any case, that's all there is to that part.

For the rest, it's exceedingly unlikely that there's actually an infinite loop. 
Instead there's a messy regexp with multiple nested quantifiers, which are 
notorious for exhibiting exponential-time behavior and especially in 
non-matching cases. They can be rewritten to have linear-time behavior instead, 
but it's an effort I personally have no interest in pursuing here. See Jeffrey 
Friedl's "Mastering Regular Expressions" book for detailed explanations.

The reason I have no interest: it's almost always a losing idea to try to parse 
any aspect of HTML with regexps. Use an HTML parser instead (or for URLs 
specifically, see urllib.parse).

----------
nosy: +tim.peters

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40879>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40879] Strange regex cycle

Reply via email to