Hi, Since there are discussions going on on the topic of allocation algorithms for various built-in types, I thought I'd mention there's a patch for turning unicode objects into variable-sized objects (rather than using a separately-allocated buffer). The aim is to make allocation of those objects lighter, and relieve cache and memory pressure a bit.
http://bugs.python.org/issue1943 Marc-André Lemburg expressed skepticism, based on the fact that it made subclassing unicode objects as part of C extensions more difficult. And here is a microbenchmark of the thing: Splitting a small string: ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()" -> Unpatched py3k: 26.4 usec per loop -> PyVarObject patch: 20.2 usec per loop Splitting a medium-sized string: ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" -> Unpatched py3k: 458 usec per loop -> PyVarObject patch: 316 usec per loop Splitting a long string: ./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()" -> Unpatched py3k: 31.3 msec per loop -> PyVarObject patch: 17.8 msec per loop Even if the patch is rejected, I think it is important to remember that implementation characteristics of the unicode type will be crucial for Py3k performance :-) Regards Antoine. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com