[issue46627] Regex hangs indefinitely

2022-02-03 Thread Tim Peters


Tim Peters  added the comment:

Introducing some kind of optional timeout is too involved to just drop in 
without significant discussion and design effort first. If you want to pursue 
this, please bring it up on the python-ideas mailing list.

My first take: it wouldn't really help, because nobody would use it until after 
it was too late ;-) So, in the absence of someone jumping on it right away, I'm 
closing this one too as "won't fix". If the idea gains traction, we can, of 
course, reopen this.

BTW, Jeffrey Friedl's book "Mastering Regular Expressions" teaches how to write 
bulletproof regexps. It's not always obvious at first glance, but it's an art 
that can be mastered.

--
nosy: +tim.peters
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


J.B. Langston  added the comment:

Sorry, on rereading your message I guess you were referring to the extra +, not 
the [^]]. The extra + after the ) was not intentional, and after removing it, 
the regex no longer hangs.

I still think it would be nice to have a timeout setting on the regex so it 
can't hang up an entire process.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


J.B. Langston  added the comment:

Yes, it is supposed to match everything up to the closing ] in this substring: 

[BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')]

Quoting from the re docs:

To match a literal ']' inside a set, precede it with a backslash, or place it 
at the beginning of the set. For example, both [()[\]{}] and []()[{}] will both 
match a parenthesis.

The docs don't specifically state the case of a negated set using ^, but I have 
used this construction many times and never had a problem with it.

Furthermore, it is not what caused the regex to hang.  That was caused by 
"(?P[^,]+)," and changing it to "(?P.+?)," fixed 
the problem.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread Matthew Barnett


Matthew Barnett  added the comment:

That pattern has:

(?P[^]]+)+

Is that intentional? It looks wrong to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


New submission from J.B. Langston :

The following code will cause Python's regex engine to hang apparently 
indefinitely: 

import re
message = "Flushed to 
[BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')]
 (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P[^]]+)+\] \((?P[^ 
]+) sstables, (?P[^)]+)\), biggest (?P[^,]+), 
smallest (?P[^ ]+)( \((?P\d+)ms\))?")
regex.match(message)

This may be a case of exponential backtracking similar to #35915 or #30973. 
Both of these issues have been closed as Wont Fix, and I suspect my issue is 
similar. The use of commas for decimal points in the input string was not 
anticipated but happened due to localization of the logs that the message came 
from.  The regex works properly when the decimal point is a period.

I will try to rewrite my regex to address this specific issue, but it's hard to 
anticipate every possible input and craft a bulletproof regex, so something 
like this kind of thing can be used for a denial of service attack (intentional 
or not). In this case the regex was used in an automated import process and 
caused the process to back up for many hours before someone noticed.  Maybe a 
solution could be to add a timeout option to the regex engine so it will give 
up and throw an exception if the regex executes for longer than the configured 
timeout.

--
components: Regular Expressions
messages: 412450
nosy: ezio.melotti, jblangston, mrabarnett
priority: normal
severity: normal
status: open
title: Regex hangs indefinitely
type: behavior
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com