On Mon, 10 Mar 2008 00:42:47 +0000, mh wrote: > I've got a bit of code in a function like this: > > s=re.sub(r'\n','\n'+spaces,s) > s=re.sub(r'^',spaces,s) > s=re.sub(r' *\n','\n',s) > s=re.sub(r' *$','',s) > s=re.sub(r'\n*$','',s) > > Is there any chance that these will be cached somewhere, and save me the > trouble of having to declare some global re's if I don't want to have > them recompiled on each function invocation?
At the interactive interpreter, type "help(re)" [enter]. A page or two down, you will see: purge() Clear the regular expression cache and looking at the source code I see many calls to _compile() which starts off with: def _compile(*key): # internal: compile pattern cachekey = (type(key[0]),) + key p = _cache.get(cachekey) if p is not None: return p So yes, the re module caches it's regular expressions. Having said that, at least four out of the five examples you give are good examples of when you SHOULDN'T use regexes. re.sub(r'\n','\n'+spaces,s) is better written as s.replace('\n', '\n'+spaces). Don't believe me? Check this out: >>> s = 'hello\nworld' >>> spaces = " " >>> from timeit import Timer >>> Timer("re.sub('\\n', '\\n'+spaces, s)", ... "import re;from __main__ import s, spaces").timeit() 7.4031901359558105 >>> Timer("s.replace('\\n', '\\n'+spaces)", ... "import re;from __main__ import s, spaces").timeit() 1.6208670139312744 The regex is nearly five times slower than the simple string replacement. Similarly: re.sub(r'^',spaces,s) is better written as spaces+s, which is nearly eleven times faster. Also: re.sub(r' *$','',s) re.sub(r'\n*$','',s) are just slow ways of writing s.rstrip(' ') and s.rstrip('\n'). -- Steven -- http://mail.python.org/mailman/listinfo/python-list