2017-01-26 22:13 GMT+01:00 Sven R. Kunze <srku...@mail.de>: > Hi folks, > > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html > > However, reaching the chart in the lower third of the article, I saw Python > 2.4 measured against a naive Thompson matching implementation. And I was > surprised about how bad it performed compared to an unoptimized version of > an older than dirt algorithm. > > So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100% > cpu and no results so far. Nothing has changed at all since 2007. > >>>> import re >>>> re.match(r'a?'*30 + r'a'*30, 'a'*30) > .... (still waiting) > > Quoting from the article: " Some might argue that this test is unfair to the > backtracking implementations, since it focuses on an uncommon corner case. > This argument misses the point: given a choice between an implementation > with a predictable, consistent, fast running time on all inputs or one that > usually runs quickly but can take years of CPU time (or more) on some > inputs, the decision should be easy." > > Victor, as the head of Python performance department, and Matthew, as the > maintainer of the new regex module, what is your stance on this particular > issue? > > From my perspective, I can say, that regular expressions might worth > optimizing especially for web applications (url matching usually uses > regexes) but also for other applications where I've seen many tight loops > using regexes as well. So, I am probing interest on this topic here. > > Best, > Sven >
Hi, I can't speak about the details of mrab's implementation, but using regex, I get the resulting match instantly: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import regex >>> regex.match(r'a?'*30 + r'a'*30, 'a'*30) <regex.Match object; span=(0, 30), match='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'> >>> (I personally prefer to use regex for other advantages, than this artificial case, but it certainly doesn't hurt to have better performance here too:) regards, vbr _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com