On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner" <[EMAIL PROTECTED]> wrote: > Paddy <[EMAIL PROTECTED]>: > > > > > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > >> Henning_Thornblad wrote: > >> > What can be the cause of the large difference between re.search and > >> > grep? > > >> grep uses a smarter algorithm ;) > > >> > This script takes about 5 min to run on my computer: > >> > #!/usr/bin/env python > >> > import re > > >> > row="" > >> > for a in range(156000): > >> > row+="a" > >> > print re.search('[^ "=]*/',row) > > >> > While doing a simple grep: > >> > grep '[^ "=]*/' input (input contains 156.000 a in > >> > one row) > >> > doesn't even take a second. > > >> > Is this a bug in python? > > >> You could call this a performance bug, but it's not common enough in real > >> code to get the necessary brain cycles from the core developers. > >> So you can either write a patch yourself or use a workaround. > > >> re.search('[^ "=]*/', row) if "/" in row else None > > >> might be good enough. > > >> Peter > > > It is not a smarter algorithm that is used in grep. Python RE's have > > more capabilities than grep RE's which need a slower, more complex > > algorithm. > > FWIW, grep itself can confirm this statement. The following command roughly > takes as long as Python's re.search: > > # grep -P '[^ "=]*/' input > > -P tells grep to use real perl-compatible regular expressions.
This confirms that a particular engine might not be optimized for it, but it's not necessarily a reflection that the engine is more complex. I'm not sure of the specifics and maybe that is the case, but it could also be a case of a different codebase which is optimized differently. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list