"Martin v. Löwis", 28.01.2011 22:49:
And indeed, when Cython is updated to 3.3, it shouldn't access the UTF-8
representation for such a loop. Instead, it should access the str
representation
Sure.
Regarding Cython specifically, the above will still be *possible* under
the proposal, given that the memory layout of the strings will still
represent the Unicode code points. It will just be trickier to implement
in Cython's type system as there is no longer a (user visible) C type
representation for those code units.
There is: Py_UCS4 remains available.
Thanks for that pointer. I had always thought that all "*UCS4*" names were
platform specific and had completely missed that type. It's a lot nicer
than Py_UNICODE because it allows users to fold surrogate pairs back into
the character value.
It's completely missing from the docs, BTW. Google doesn't give me a single
mention for all of docs.python.org, even though it existed at least since
(and likely long before) Cython's oldest supported runtime Python 2.3.
If I had known about that type earlier, I could have ended up making that
the native Unicode character type in Cython instead of bothering with
Py_UNICODE. But this can still be changed I think. Since type inference was
available before native Py_UNICODE support, it's unlikely that users will
have Py_UNICODE written in their code explicitly. So I can make the switch
under the hood.
Just to explain, a native CPython C type is much better than an arbitrary
integer type, because it allows Cython to apply specific coercion rules
from and to Python object types. As currently Py_UNICODE, Py_UCS4 would
obviously coerce from and to a 1 character Unicode string, but it could
additionally handle surrogate pair splitting and combining automatically on
current 16-bit Unicode builds so that you'd get a Unicode string with two
code points on coercion to Python.
While I'm somewhat confident that I'll
find a way to fix this in Cython, my point is just that this adds a
certain level of complexity to C code using the new memory layout that
simply wasn't there before.
Understood. However, I think it is easier than you think it is.
Let's see about the implications once there is an implementation.
Stefan
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com