On Thursday 21 April 2005 09:01 am, codecraig wrote: > I am interested in regular expressions and how Perl and Python > compare. Particulary, I am interested in performance (i.e. speed), > memory usage, flexibility, completeness (i.e. supports simple and > complex regex operations...basically is RegEx a strong module/library > in Python?)
Understand that I have used regexes very very little in Perl (I took a class, that's about it). However, I have translated a couple of Perl modules into Python. I find that Perl programmers use the rather opaque "regex style" much too often, so that I usually replace several regexes with simple string searches, e.g. original program: uses regex to match /.*foo.*/ python translation: just use s.find('foo') That's not really for performance reasons (though it probably is faster?), but because it just makes it clearer what you're trying to do. OTOH, some of the regexes will be "real" regexes, in which case Python's way of expressing regexes as strings makes things a whole lot clearer, e.g.: junk = r'.*' word = r'\b\w+\b' domain = r'(%s\.)*%s' % (word,word) re_mail = re.compile(junk + word + '@' + domain + junk ) Although, of course, you can just write: re_mail = re.compile(r'.*\b\w+\b@(\b\w+\b.)*\b\w+\b.*') Which is shorter, but frankly, I had a hard time just keeping it straight to type it here --- I think the first version was actually faster to write, even if it takes up more space. Also, I screwed up on the first time when I wrote the regex for a word (forgot the '+'), so having it factored out like this made it faster to fix the mistake. Which is still a dumb example, but you can see what I mean about making the code easier to read / refactor. AFAIK, Perl does not make this particularly easy. Python regexes probably allow almost, but not quite all, of what Perl regexes do (I think the current Python regex language is pretty much identical with the one in Perl 5, but some newer features are in the most cutting-edge Perl release, IIRC). For all but the simplest jobs, Python regexes should be compiled, as I do above. In fact, I just never bother with using them directly --- I think the regex will get compiled when used, even if you don't do it explicitly, and the explicit compiled regex can be stored for multiple uses, etc. Although it wouldn't surprise me to learn that Perl's regex engine is slightly more optimized (seeing as it is used so much), I wouldn't want to bet on it. I doubt you'll notice any difference even if one exists, and the speedup from eliminating regexes where they don't belong would probably wipe it out anyway. > Anyone have any information on this? Any numbers, benchmarks? No benchmarks, sorry. I don't care enough about the speed. But I do feel that Python regexes are both clearer and more flexible. They encourage code re-use and self-documentation. And they can pretty much do whatever their Perl equivalents can. OTOH, Python programmers do not love them the way Perl programmers do. So they are used less. This is not least because Python has a lot of very powerful higher-level string manipulation tools. > Thanks so much. I know this is a python user group...but try to be has > un-biased as you can. Can't claim to be unbiased, sorry. ;-) Cheers, Terry -- Terry Hancock ( hancock at anansispaceworks.com ) Anansi Spaceworks http://www.anansispaceworks.com -- http://mail.python.org/mailman/listinfo/python-list