[Steven D'Aprano ]
> I've been interested in the existence of SNOBOL string scanning for
> a long time, but I know very little about it.
>
> How does it differ from regexes, and why have programming languages
> pretty much standardised on regexes rather than other forms of string
> matching?
What
[Tim]
>> In SNOBOL, as I recall, it could be spelled
>>
>> ARB "spam" FENCE
[Chris]
> Ah, so that's a bit more complicated than the "no-backtracking"
> parsing style of REXX and scanf.
Oh, a lot more complex. In SNOBOL, arbitrary computation can be
performed at any point in pattern
On Tue, 15 Feb 2022 at 13:57, Tim Peters wrote:
> In SNOBOL, as I recall, it could be spelled
>
> ARB "spam" FENCE
>
> Those are all pattern objects, and infix whitespace is a binary
> pattern catenation operator.
>
> ARB is a builtin pattern that matches the empty string at first, and
>
[Tim]
>>> That leaves the happy 5% who write "[^X]*X", which
>>> finally says what they intended from the start.
[Steven]
>> Doesn't that only work if X is literally a single character?
RIght. It was an examp[e, not a meta-example. Even for a _single
character_, "match up to the next, but never
On Tue, 15 Feb 2022 at 11:47, Steven D'Aprano wrote:
>
> > Another 20% will write ".*?X", with scant understanding that may
> > extend beyond _just_ "the next" X in some cases.
>
> But this surprises me. Do you have an example?
Nongreedy means it'll prefer the next X, but it has to be open to
On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote:
> An interesting lesson nobody wants to learn: the original major
> string-processing language, SNOBOL, had powerful pattern matching but
> no regexps. Griswold's more modern successor language, Icon, found no
> reason to change that.
On Mon, Feb 14, 2022 at 03:58:49PM -0600, Nick Timkovich wrote:
> While definitely not as bad and not as likely as SQL injection, I think the
> possibility of regex DoS is totally missing in the stdlib re docs. Should
> there be something added there about if you need to put user input into an
>
"""
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
- Jamie Zawinski
"""
Even more true of regexps than of floating point, and even of daemon threads ;-)
regex is a terrific module, incorporating many features that newer
>
> A regex that's vulnerable to pathological behavior is a DoS attack waiting
> to happen. Especially when used for parsing log data (which might contain
> untrusted data). If possible, we should make it harder for people to shoot
> themselves in the feet.
>
While definitely not as bad and not
On Mon, Feb 14, 2022 at 9:55 AM J.B. Langston
wrote:
> ... more generally I think it would be good to have a timeout option that
> could be configurable when compiling the regex so that if the regex didn't
> complete within the specified timeframe, it would abort and throw an
> exception.
>
>
For what it's worth, the "regex" library on PyPI (not "re") supports
timeouts:
https://pypi.org/project/regex/
On Mon, Feb 14, 2022, 6:54 PM J.B. Langston wrote:
> Hello,
>
> I had opened this bug because I had a bad regex in my code that was
> causing python to hang in the regex evaluation:
>
Hello,
I had opened this bug because I had a bad regex in my code that was causing
python to hang in the regex evaluation: https://bugs.python.org/issue46627.
I have fixed the problem with that specific regex, but more generally I
think it would be good to have a timeout option that could be
12 matches
Mail list logo