[issue44699] Simple regex appears to take exponential time in length of input

2021-07-21 Thread János Brezniczky

János Brezniczky  added the comment:

Hello! Thanks for the reply - you are probably right, and I accept that it's 
normal. However, there may be numerous frameworks getting built on the concept 
of collaboratively contributed regexes.

(Python likes prototyping, prototyping likes portable programming constructs, 
regular expressions are portable, for this and other factors and as past 
practice suggests, these will be a temptingly plausible choice as a basis for 
frameworks, etc.)

So I do accept it's normal (I haven't been deeply into regexes for cca 10 
years, but I vaguely recall what's complex depends a lot on the type of engine, 
maybe A or B? pass on that one).

However people make mistakes that attackers seek...

(Not to lose that point, I am otherwise okay to let it go as "not a bug" if 
you're certain! I guess something should be done though.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44699] Simple regex appears to take exponential time in length of input

2021-07-21 Thread Matthew Barnett


Matthew Barnett  added the comment:

It's called "catastrophic backtracking". Think of the number of ways it could 
match, say, 4 characters: 4, 3+1, 2+2, 2+1+1, 1+3, 1+2+1, 1+1+2, 1+1+1+1. Now 
try 5 characters...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44699] Simple regex appears to take exponential time in length of input

2021-07-21 Thread János Brezniczky

János Brezniczky  added the comment:

I'd also raise for consideration the introduction a (default?) timeout on 
regexes, similarly to how such a feature seems available in .NET. 

Given the DOS vector vs. occasionally non-trivially complex expressions, this 
could draw developer attention to this security aspect and stimulate the 
evolution of a more secure ecosystem.

https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44699] Simple regex appears to take exponential time in length of input

2021-07-21 Thread János Brezniczky

János Brezniczky  added the comment:

Might be related to https://bugs.python.org/issue43075, as in would trigger 
ReDOSability had my syntax highlighting-related changes (which eventually broke 
golden master tests, fortunately :) ) hit the public.

(Meaning my reduced - possibly minimal - corner case stems from a practical 
scenario as well.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44699] Simple regex appears to take exponential time in length of input

2021-07-21 Thread János Brezniczky

New submission from János Brezniczky :

I don't know if it's normal, but it's impractical.
I seem to have come by an expression consuming o(c^n) CPU cycles with c around 
2.

Regex:
\*([^*]+)*\*

Resulted in times (in seconds) of 

0.17   (* is there anybody out?)
0.34 1
0.69 12
1.36 123
2.73  1234
5.44  12345
11.1  123456  (* is there anybody123456 out?)


Please see the source for more details/repro.

--
components: Regular Expressions
files: regex_test.py
messages: 397948
nosy: brezniczky, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Simple regex appears to take exponential time in length of input
type: performance
versions: Python 3.8
Added file: https://bugs.python.org/file50169/regex_test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com