Hi Sven, On 26 January 2017 at 22:13, Sven R. Kunze <srku...@mail.de> wrote: > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html
Theoretical regular expressions and what Python/Perl/etc. call regular expressions are a bit different. You can read more about it at https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times . Discussions about why they are different often focus on backreferences, which is a rare feature. Let me add two other points. The theoretical kind of regexp is about giving a "yes/no" answer, whereas the concrete "re" or "regexp" modules gives a match object, which lets you ask for the subgroups' location, for example. Strange at it may seem, I am not aware of a way to do that using the linear-time approach of the theory---if it answers "yes", then you have no way of knowing *where* the subgroups matched. Another issue is that the theoretical engine has no notion of greedy/non-greedy matching. Basically, you walk over the source character and it answers "yes" or "no" after each of them. This is different from a typical backtracking implementation. In Python: >>> re.match(r'a*', 'aaa') >>> re.match(r'a*?', 'aaa') This matches either three or zero characters in Python. The two versions are however indistinguishable for the theoretical engine. A bientôt, Armin. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com