[Tim, on trying to match only the next instance of "spam"]
> ,,,
> It's actually far easier if assertions are used, and I'm too old to
> bother trying to repair the non-assertion mess:
>
> ([^s]|s(?!pam))*spam

Since then, Serhiy rehabilitated an old patch to add "atomic groups"
and "possessive quantifiers" to CPython's `re`, and merged it today.
So in 3.11, you'll be able to do the far easier:

    (?>.*?spam)

instead. (?>,,,) delimits an "atomic group", and `...` can be any
regexp. The group is non-capturing. `...` is matched in the normal
way, but with a twist: _after_ it matches (if it ever does), it's
done. The first match it finds is also the last match it will try. If,
after succeeding, the overall match fails to the right of the atomic
group, backing up into the group fails at once - no other alternatives
for `...` are tried.

So, in the example, .*?spam finds the closest (if any) following
instance of "spam", and that's all. It won't ever go on to try to
match a later instance of "spam".

Which is probably what most people had in mind all along when they wrote plain

    .*?spam

but were instead delighted by mysterious cases of exponential-time
backtracking ;-)

"Possessive quantif\iers" are syntactic sugar for particular simple
instances of atomic groups. Where R is a regexp,

R++ is short for (?>R+)
R*+ is short for (?>R*)
R{m,n}+ is short for (?>R{m,n})
R?+ is short for (?>R?)

In all cases, they take the longest match possible for R starting
right here, right now, without any consideration for whether that may
or may not cause the rest of the pattern to fail, and then they hold
on to that longest-possible match no matter what.  No backtracking
after the first success.

You should feel free to use these in 3.11. There are few modern regexp
engines that don't support them, and they were the most conspicuous of
the "missing features" in Python's `re`.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q25SGBOXX5XYJH55Y3XZ5FNGITSKHW5Y/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to