Hi, I wrote quick hack to expose _PyUnicodeWriter as _string.UnicodeWriter: http://www.haypocalc.com/tmp/string_unicode_writer.patch
And I wrote a (micro-)benchmark: http://www.haypocalc.com/tmp/bench_join.py ( The benchmark uses only ASCII string, it would be interesting to test latin1, BMP and non-BMP characters too. ) UnicodeWriter (using the "writer += str" API) is the fastest method in most cases, except for data = ['a'*10**4] * 10**2 (in this case, it's 8x slower!). I guess that the overhead comes for the overallocation which then require to shrink the buffer (shrinking may copy the whole string). The overallocation factor may be adapted depending on the size. If computing the final length is cheap (eg. if it's always the same), it's always faster to use UnicodeWriter with a preallocated buffer. The "UnicodeWriter +=; preallocate" test uses a precomputed length (ok, it's cheating!). I also implemented UnicodeWriter.append method to measure the overhead of a method lookup: it's expensive :-) -- Platform: Linux-3.6.10-2.fc16.x86_64-x86_64-with-fedora-16-Verne Python unicode implementation: PEP 393 Date: 2013-02-14 01:00:06 CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes SCM: hg revision=659ef9d360ae+ tag=tip branch=default date="2013-02-13 15:25 +0000" CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Python version: 3.4.0a0 (default:659ef9d360ae+, Feb 14 2013, 00:35:19) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] Bits: int=32, long=64, long long=64, pointer=64 [ data = ['a'] * 10**2 ] 4.21 us: UnicodeWriter +=; preallocate 4.86 us (+15%): UnicodeWriter append; lookup attr once 4.99 us (+18%): UnicodeWriter += 6.35 us (+51%): str += str 6.45 us (+53%): io.StringIO; lookup attr once 7.02 us (+67%): "".join(list) 7.46 us (+77%): UnicodeWriter append 8.77 us (+108%): io.StringIO [ data = ['abc'] * 10**4 ] 356 us: UnicodeWriter append; lookup attr once 375 us (+5%): UnicodeWriter +=; preallocate 376 us (+6%): UnicodeWriter += 495 us (+39%): io.StringIO; lookup attr once 614 us (+73%): "".join(list) 629 us (+77%): UnicodeWriter append 716 us (+101%): str += str 737 us (+107%): io.StringIO [ data = ['a'*10**4] * 10**1 ] 3.67 us: str += str 3.76 us: UnicodeWriter +=; preallocate 3.95 us (+8%): UnicodeWriter += 4.01 us (+9%): UnicodeWriter append; lookup attr once 4.06 us (+11%): "".join(list) 4.24 us (+15%): UnicodeWriter append 4.59 us (+25%): io.StringIO; lookup attr once 4.77 us (+30%): io.StringIO [ data = ['a'*10**4] * 10**2 ] 41.2 us: UnicodeWriter +=; preallocate 43.8 us (+6%): str += str 45.4 us (+10%): "".join(list) 45.9 us (+11%): io.StringIO; lookup attr once 48.3 us (+17%): io.StringIO 370 us (+797%): UnicodeWriter += 370 us (+798%): UnicodeWriter append; lookup attr once 377 us (+816%): UnicodeWriter append [ data = ['a'*10**4] * 10**4 ] 38.9 ms: UnicodeWriter +=; preallocate 39 ms: "".join(list) 39.1 ms: io.StringIO; lookup attr once 39.4 ms: UnicodeWriter append; lookup attr once 39.5 ms: io.StringIO 39.6 ms: UnicodeWriter += 40.1 ms: str += str 40.1 ms: UnicodeWriter append Victor 2013/2/13 Antoine Pitrou <solip...@pitrou.net>: > Le Wed, 13 Feb 2013 09:02:07 +0100, > Victor Stinner <victor.stin...@gmail.com> a écrit : >> I added a _PyUnicodeWriter internal API to optimize str%args and >> str.format(args). It uses a buffer which is overallocated, so it's >> basically like CPython str += str optimization. I still don't know how >> efficient it is on Windows, since realloc() is slow on Windows (at >> least on old Windows versions). >> >> We should add an official and public API to concatenate strings. > > There's io.StringIO already. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com