James Y Knight, 27.01.2011 21:26:
On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote:
"Martin v. Löwis", 24.01.2011 21:17:
The Py_UNICODE type is still supported but deprecated. It is always
defined as a typedef for wchar_t, so the wstr representation can
double as Py_UNICODE representation.
It's too bad this isn't initialised by default, though. Py_UNICODE is
the only representation that can be used efficiently from C code and
Cython relies on it for fast text processing. This proposal will
therefore likely have a pretty negative performance impact on
extensions written in Cython as the compiler could no longer expect
this representation to be available instantaneously.
But the whole point of the exercise is so that it doesn't have to store
a 4byte-per-char representation when a 1byte-per-char rep would do.
I am well aware of that. But I'm arguing that the current simpler internal
representation has had its advantages for CPython as a platform.
If cython wants to work most efficiently with this proposal, it should
learn to deal with the three possible raw representations.
I agree. After all, CPython is lucky to have it available. It wouldn't be
the first time that we duplicate looping code based on the input type.
However, like the looping code, it will also complicate all indexing code
at runtime as it always needs to test which of the representations is
current before it can read a character. Currently, all of this is a compile
time decision. This will necessarily have a performance impact.
Stefan
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com