On Apr 12, 3:40 am, Steven D'Aprano <st...@remove-this- cybersource.com.au> wrote: > On Sat, 11 Apr 2009 08:40:03 -0700, John Machin wrote: > >> To my mind, this is a bug in the RE engine. Is there any reason to not > >> treat it as a bug? > > > IMHO it's not a bug -- s/hang/takes a long time to compute/ > > > Just look at it: 2 + operators and 3 * operators ... It's one of those > > "come back after lunch" REs. > > Well, it's been running now for about two and a half hours, that's a > rather long lunch. And despite MRAB's assertion, it *cannot* be > interrupted by ctrl-C. That means that to all intents and purposes, the > interpreter has locked up for the duration of the calculation, which may > be days or weeks for all I know.
If you don't know, experiment! import re, time _re_comments = re.compile(r'^(([^\\]+|\\.|"([^"\\]+|\\.)*")*)#.*$') for end in ("# yadda", ""): for start in ("", '"#"'): for size in xrange(25): line = start + "x" * size + end print line t0 = time.clock() result = _re_comments_nc.sub(r"\1", line) t1 = time.clock() print "%s (%.6f secs)" % (result, t1 - t0) Observations: 1. Runs at light speed in first two cases (genuine comemnt to be removed) 2. Takes approx O(2**N) in the 3rd case (no # in sight) 3. Takes even longer to produce a *WRONG* result in the 4th case (# in a string literal) 4. The RE assumes that only " can quote a string literal; what about ' ''' and """? Cluesticks: 1. if "#" not in line: dont_need_no_stinking_regexes() 2. try timing on short pieces of input 3. Test the RE, get it correct -- http://mail.python.org/mailman/listinfo/python-list