For some time I'm interested in regular expressions and Finite State Machine. Recently, I saw that Python uses "Secret Labs' Regular Expression Engine", which very often works too slow. Its pesymistic time complexity is O(2^n), although other solutions, with time complexity O(n*m) ( O(n*m^2), m is the length of the regular expression and n is the length of the text, introduction to problem: http://swtch.com/~rsc/regexp/regexp1.html )
For example: $ cat test.py import re import sys if sys.argv[1:]: n = int(sys.argv[1]) regexp = 'a?'*n + 'a'*n print 'regexp=', regexp str = 'a'*n + 'b' + 'a'*n print 'str=', str print re.findall(regexp, str) $ time python test.py 20 regexp= a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa str= aaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaa ['aaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaa'] real 0m0.592s user 0m0.508s sys 0m0.028s $ time python test.py 21 regexp= a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaa str= aaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaa ['aaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaa'] real 0m1.018s user 0m1.000s sys 0m0.004s $ time python test.py 22 regexp= a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaa str= aaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaa ['aaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaa'] real 0m2.062s user 0m1.992s sys 0m0.028s $ time python test.py 23 regexp= a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaa str= aaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaa ['aaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaa'] real 0m4.159s user 0m4.092s sys 0m0.024s Unfortuntely not for all regular expressions quick implementation may be used (e.g. when it uses backreferences), but for major part of regular expressions this implementation works much faster. In addition to Google Summer of Code I would willingly write a faster implementation of regular expressions in Python. Naturally, the implementation would be 100% compatible with the existing regexp engine. What do you think about my proposition? -- Bartlomiej Wolowiec _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com