[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


J.B. Langston  added the comment:

Sorry, on rereading your message I guess you were referring to the extra +, not 
the [^]]. The extra + after the ) was not intentional, and after removing it, 
the regex no longer hangs.

I still think it would be nice to have a timeout setting on the regex so it 
can't hang up an entire process.

--

___
Python tracker 
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


J.B. Langston  added the comment:

Yes, it is supposed to match everything up to the closing ] in this substring: 

[BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')]

Quoting from the re docs:

To match a literal ']' inside a set, precede it with a backslash, or place it 
at the beginning of the set. For example, both [()[\]{}] and []()[{}] will both 
match a parenthesis.

The docs don't specifically state the case of a negated set using ^, but I have 
used this construction many times and never had a problem with it.

Furthermore, it is not what caused the regex to hang.  That was caused by 
"(?P[^,]+)," and changing it to "(?P.+?)," fixed 
the problem.

--

___
Python tracker 
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread J.B. Langston


New submission from J.B. Langston :

The following code will cause Python's regex engine to hang apparently 
indefinitely: 

import re
message = "Flushed to 
[BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')]
 (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P[^]]+)+\] \((?P[^ 
]+) sstables, (?P[^)]+)\), biggest (?P[^,]+), 
smallest (?P[^ ]+)( \((?P\d+)ms\))?")
regex.match(message)

This may be a case of exponential backtracking similar to #35915 or #30973. 
Both of these issues have been closed as Wont Fix, and I suspect my issue is 
similar. The use of commas for decimal points in the input string was not 
anticipated but happened due to localization of the logs that the message came 
from.  The regex works properly when the decimal point is a period.

I will try to rewrite my regex to address this specific issue, but it's hard to 
anticipate every possible input and craft a bulletproof regex, so something 
like this kind of thing can be used for a denial of service attack (intentional 
or not). In this case the regex was used in an automated import process and 
caused the process to back up for many hours before someone noticed.  Maybe a 
solution could be to add a timeout option to the regex engine so it will give 
up and throw an exception if the regex executes for longer than the configured 
timeout.

--
components: Regular Expressions
messages: 412450
nosy: ezio.melotti, jblangston, mrabarnett
priority: normal
severity: normal
status: open
title: Regex hangs indefinitely
type: behavior
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com