[Python-3000] Allocation of unicode objects

Antoine Pitrou Sun, 10 Feb 2008 06:35:11 -0800

Hi,

Since there are discussions going on on the topic of allocation algorithms for
various built-in types, I thought I'd mention there's a patch for turning
unicode objects into variable-sized objects (rather than using a
separately-allocated buffer). The aim is to make allocation of those objects
lighter, and relieve cache and memory pressure a bit.


http://bugs.python.org/issue1943

Marc-André Lemburg expressed skepticism, based on the fact that it made
subclassing unicode objects as part of C extensions more difficult.

And here is a microbenchmark of the thing:

Splitting a small string:
./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()"
-> Unpatched py3k: 26.4 usec per loop
-> PyVarObject patch: 20.2 usec per loop

Splitting a medium-sized string:
./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()"
-> Unpatched py3k: 458 usec per loop
-> PyVarObject patch: 316 usec per loop

Splitting a long string:
./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()"
-> Unpatched py3k: 31.3 msec per loop
-> PyVarObject patch: 17.8 msec per loop

Even if the patch is rejected, I think it is important to remember that
implementation characteristics of the unicode type will be crucial for Py3k
performance :-)

Regards

Antoine.


_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

[Python-3000] Allocation of unicode objects

Reply via email to