On Feb 10, 2008 4:53 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Since there are discussions going on on the topic of allocation algorithms for > various built-in types, I thought I'd mention there's a patch for turning > unicode objects into variable-sized objects (rather than using a > separately-allocated buffer). The aim is to make allocation of those objects > lighter, and relieve cache and memory pressure a bit. > > http://bugs.python.org/issue1943 > > Marc-André Lemburg expressed skepticism, based on the fact that it made > subclassing unicode objects as part of C extensions more difficult.
Has anybody ever tried that? The same would apply to PyString and I've never heard this complaint. I think that given the relative importance of fast strings in Py3k vs. the convenience of subclassing PyUnicode, the latter may have to suffer. > And here is a microbenchmark of the thing: > > Splitting a small string: > ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()" > -> Unpatched py3k: 26.4 usec per loop > -> PyVarObject patch: 20.2 usec per loop > > Splitting a medium-sized string: > ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" > -> Unpatched py3k: 458 usec per loop > -> PyVarObject patch: 316 usec per loop > > Splitting a long string: > ./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()" > -> Unpatched py3k: 31.3 msec per loop > -> PyVarObject patch: 17.8 msec per loop > > Even if the patch is rejected, I think it is important to remember that > implementation characteristics of the unicode type will be crucial for Py3k > performance :-) Right. I haven't had enough time to review this (or any other patch), but the idea is very appealing. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com