[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Tim Peters
[Steven D'Aprano ] > I've been interested in the existence of SNOBOL string scanning for > a long time, but I know very little about it. > > How does it differ from regexes, and why have programming languages > pretty much standardised on regexes rather than other forms of string > matching? What

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Tim Peters
[Tim] >> In SNOBOL, as I recall, it could be spelled >> >> ARB "spam" FENCE [Chris] > Ah, so that's a bit more complicated than the "no-backtracking" > parsing style of REXX and scanf. Oh, a lot more complex. In SNOBOL, arbitrary computation can be performed at any point in pattern

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Chris Angelico
On Tue, 15 Feb 2022 at 13:57, Tim Peters wrote: > In SNOBOL, as I recall, it could be spelled > > ARB "spam" FENCE > > Those are all pattern objects, and infix whitespace is a binary > pattern catenation operator. > > ARB is a builtin pattern that matches the empty string at first, and >

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Tim Peters
[Tim] >>> That leaves the happy 5% who write "[^X]*X", which >>> finally says what they intended from the start. [Steven] >> Doesn't that only work if X is literally a single character? RIght. It was an examp[e, not a meta-example. Even for a _single character_, "match up to the next, but never

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Chris Angelico
On Tue, 15 Feb 2022 at 11:47, Steven D'Aprano wrote: > > > Another 20% will write ".*?X", with scant understanding that may > > extend beyond _just_ "the next" X in some cases. > > But this surprises me. Do you have an example? Nongreedy means it'll prefer the next X, but it has to be open to

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Steven D'Aprano
On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote: > An interesting lesson nobody wants to learn: the original major > string-processing language, SNOBOL, had powerful pattern matching but > no regexps. Griswold's more modern successor language, Icon, found no > reason to change that.

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Steven D'Aprano
On Mon, Feb 14, 2022 at 03:58:49PM -0600, Nick Timkovich wrote: > While definitely not as bad and not as likely as SQL injection, I think the > possibility of regex DoS is totally missing in the stdlib re docs. Should > there be something added there about if you need to put user input into an >

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Tim Peters
""" Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. - Jamie Zawinski """ Even more true of regexps than of floating point, and even of daemon threads ;-) regex is a terrific module, incorporating many features that newer

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Nick Timkovich
> > A regex that's vulnerable to pathological behavior is a DoS attack waiting > to happen. Especially when used for parsing log data (which might contain > untrusted data). If possible, we should make it harder for people to shoot > themselves in the feet. > While definitely not as bad and not

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Bruce Leban
On Mon, Feb 14, 2022 at 9:55 AM J.B. Langston wrote: > ... more generally I think it would be good to have a timeout option that > could be configurable when compiling the regex so that if the regex didn't > complete within the specified timeframe, it would abort and throw an > exception. > >

[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Jonathan Slenders
For what it's worth, the "regex" library on PyPI (not "re") supports timeouts: https://pypi.org/project/regex/ On Mon, Feb 14, 2022, 6:54 PM J.B. Langston wrote: > Hello, > > I had opened this bug because I had a bad regex in my code that was > causing python to hang in the regex evaluation: >

[Python-ideas] Regex timeouts

2022-02-14 Thread J.B. Langston
Hello, I had opened this bug because I had a bad regex in my code that was causing python to hang in the regex evaluation: https://bugs.python.org/issue46627. I have fixed the problem with that specific regex, but more generally I think it would be good to have a timeout option that could be