[issue16061] performance regression in string replace for 3.3

STINNER Victor Wed, 10 Oct 2012 13:38:47 -0700

STINNER Victor added the comment:

> The code is now using the heavily optimized findchar() function.


I compared performances of the two methods: dummy loop vs find. Results with a 
string of 100,000 characters:

 * Replace 100% (rewrite all characters): find is 12.5x slower than a loop
 * Replace 50%: find is 3.3x slower
 * Replace only 2 characters (0.001%): find is 10.4x faster

In practice, I bet that the most common case is to replace only a few 
characters. Replace all characters is a rare usecase.

Use attached "unicode.patch" on Python 3.4 with the following commands to 
reproduce my benchmark:

python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, 
b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"

--

An option is to use the find method, and then switch to the dummy loop method 
if there are too much characters to replace. I don't know if it's necessary to 
develop such complex algorithm. It would be better to have a benchmark 
extracted from a real world application like a template engine.

----------
keywords: +patch
Added file: http://bugs.python.org/file27521/unicode.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16061>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16061] performance regression in string replace for 3.3

Reply via email to