Various notes:
 * PyUnicode_READ() is slower than reading a Py_UNICODE array.
 * Some decoders unroll the main loop to process 4 or 8 bytes (32 or
64 bits CPU) at each step.

I am interested if you know other tricks to optimize Unicode strings
in Python, or if you are interested to work on this topic.

Beyond creation, the most frequent approach is to specialize loops for
all three possible width, allowing the compiler to hard-code the element
size. This brings it back in performance to the speed of accessing a
Py_UNICODE array (or faster for 1-byte strings).

A possible micro-optimization might be to use pointer arithmetic instead
of indexing. However, I would expect that compilers will already convert
a counting loop into pointer arithmetic if the index is only ever used
for array access.

A source of slow-down appears to be widening copy operations. I wonder
whether microprocessors are able to do this faster than what the compiler
generates out of a naive copying loop.

Another potential area for further optimization is to better pass-through
PyObject*. Some APIs still use char* or Py_UNICODE*, when the caller actually
holds a PyObject*, and the callee ultimate recreates an object out of the
pointers being passed.

Some people (hi Larry) still think that using a rope representation for
string concatenation might improve things, see #1569040.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to