New submission from J.B. Langston :
The following code will cause Python's regex engine to hang apparently
indefinitely:
import re
message = "Flushed to
[BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')]
(1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P[^]]+)+\] \((?P[^
]+) sstables, (?P[^)]+)\), biggest (?P[^,]+),
smallest (?P[^ ]+)( \((?P\d+)ms\))?")
regex.match(message)
This may be a case of exponential backtracking similar to #35915 or #30973.
Both of these issues have been closed as Wont Fix, and I suspect my issue is
similar. The use of commas for decimal points in the input string was not
anticipated but happened due to localization of the logs that the message came
from. The regex works properly when the decimal point is a period.
I will try to rewrite my regex to address this specific issue, but it's hard to
anticipate every possible input and craft a bulletproof regex, so something
like this kind of thing can be used for a denial of service attack (intentional
or not). In this case the regex was used in an automated import process and
caused the process to back up for many hours before someone noticed. Maybe a
solution could be to add a timeout option to the regex engine so it will give
up and throw an exception if the regex executes for longer than the configured
timeout.
--
components: Regular Expressions
messages: 412450
nosy: ezio.melotti, jblangston, mrabarnett
priority: normal
severity: normal
status: open
title: Regex hangs indefinitely
type: behavior
versions: Python 3.8
___
Python tracker
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com