Re: [Python-Dev] re performance

Armin Rigo Sat, 28 Jan 2017 03:47:55 -0800

Hi Sven,

On 26 January 2017 at 22:13, Sven R. Kunze <srku...@mail.de> wrote:
> I recently refreshed regular expressions theoretical basics *indulging in
> reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html


Theoretical regular expressions and what Python/Perl/etc. call regular
expressions are a bit different.  You can read more about it at
https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times
.

Discussions about why they are different often focus on
backreferences, which is a rare feature.  Let me add two other points.

The theoretical kind of regexp is about giving a "yes/no" answer,
whereas the concrete "re" or "regexp" modules gives a match object,
which lets you ask for the subgroups' location, for example.  Strange
at it may seem, I am not aware of a way to do that using the
linear-time approach of the theory---if it answers "yes", then you
have no way of knowing *where* the subgroups matched.

Another issue is that the theoretical engine has no notion of
greedy/non-greedy matching.  Basically, you walk over the source
character and it answers "yes" or "no" after each of them.  This is
different from a typical backtracking implementation.  In Python:

>>> re.match(r'a*', 'aaa')
>>> re.match(r'a*?', 'aaa')

This matches either three or zero characters in Python.  The two
versions are however indistinguishable for the theoretical engine.


A bientôt,

Armin.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] re performance

Reply via email to