On Fri, Jul 23, 2010 at 8:16 PM, Hrvoje Niksic <hrvoje.nik...@avl.com> wrote:
> The performance trade-off should make regex slower with sufficiently small
> compiled regex cache, when a lot of time is wasted on compilation.  But as
> the cache gets larger (and, for fairness, of the same size in both
> implementations), regex should outperform re.  Georg, would you care to
> measure if there is a difference in performance with an even larger cache?

That's not quite accurate. The figure that matters is "matches per
compilation", and that is ultimately governed by the nature of the
application once the cache is sufficiently large (e.g. an application
that compiles 10 different regular expressions, but also only matches
each one against a single string is going to be slowed down
significantly by a switch to regex no matter how big the compilation
cache may be).

The total runtime for a particular re matching exercise is "(average
compilation time)*(number of times compiled) + (average match
time)*(number of matches performed)"

We know that the new regex module speeds up matching at the cost of
slowing down compilation. Whether that is a net increase for the
application as a whole depends on 4 things:

X: the average speed change in compilation for the application
Y: the average speed change in matching for the application
A: the proportion of time the application currently spends compiling
regular expressions
B: the proportion of time the application currently spends matching
regular expressions

The overall proportional speed change for re operations in a given
application is then just X*A + Y*B. For regex to be an improvement for
that application, the resulting figure needs to be less than 1.

For example, suppose X=2, Y=0.5 (i.e. matching is twice as fast on
average, but compilation also takes twice as long), then the overall
speed change would be 2*A + 0.5*B. For that to be a wash, the original
application needs to spend 1/3 of its expression matching time
compiling the expressions and 2/3 actually matching them (the
application with the new engine would then spend 2/3 of its time
compiling and only 1/3 matching).

That's going to be a tough assessment to make in practice, but I think
Georg's pygments test suite example shows that there is real world
code out there that already spends a lot of time on compilation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to