[issue16061] performance regression in string replace for 3.3

Serhiy Storchaka Mon, 31 Dec 2012 03:21:32 -0800

Serhiy Storchaka added the comment:

> str_replace_1char.patch: why not implementing replace_1char_inplace() in
> stringlib, with one version per character type (UCS1, UCS2, UCS4)?


Because there are no benefits to do it. All three versions (UCS1, UCS2, and 
UCS4) have no any common code. The best implementation used for every kind of 
strings. For UCS1 it uses fast memchr() (findchar() has some overhead here), 
for UCS2 it uses findchar(), and for UCS4 it uses a dumb loop, because 
findchar() will be too ineffective here.

> I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs
> two loops for str_replace_1char.patch, with a threshold of 10 different
> characters).

Yes, UCS1-implementation in str_replace_1char.patch is more complicated, but 
it is faster for more input strings. memchr() is more effective than a simple 
loop when the replaceable characters are rare. But when they meet often, a 
simple cycle is more efficient. The "attempts" counter determines how many 
characters will be checked before using memchr(). This speeds up the 
replacement in strings with frequent replacements, but a little slow down the 
replacement in strings with rare replacements. 10 is a compromise. 
str_replace_1char.patch speed up not only case when *each* character replaced, 
but when 1/2, 1/3, 1/5,... characters replaced.

> Why do you changed your algorithm? Is str_replace_1char.patch algorithm
> more efficient than unicode_2.patch algorithm? Is the speedup really
> interesting?

You can run benchmarks and compare results. str_replace_1char.patch provides 
not the best performance, but most stable results for wide sort of strings, 
and has no regressions comparing with 3.2.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue16061>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16061] performance regression in string replace for 3.3

Reply via email to