Re: [Python-Dev] PEP 393: Flexible String Representation

Stefan Behnel Fri, 28 Jan 2011 22:36:14 -0800

"Martin v. Löwis", 28.01.2011 22:49:

And indeed, when Cython is updated to 3.3, it shouldn't access the UTF-8
representation for such a loop. Instead, it should access the str
representation


Sure.

Regarding Cython specifically, the above will still be *possible* under
the proposal, given that the memory layout of the strings will still
represent the Unicode code points. It will just be trickier to implement
in Cython's type system as there is no longer a (user visible) C type
representation for those code units.


There is: Py_UCS4 remains available.

Thanks for that pointer. I had always thought that all "*UCS4*" names wereplatform specific and had completely missed that type. It's a lot nicerthan Py_UNICODE because it allows users to fold surrogate pairs back intothe character value.

It's completely missing from the docs, BTW. Google doesn't give me a singlemention for all of docs.python.org, even though it existed at least since(and likely long before) Cython's oldest supported runtime Python 2.3.

If I had known about that type earlier, I could have ended up making thatthe native Unicode character type in Cython instead of bothering withPy_UNICODE. But this can still be changed I think. Since type inference wasavailable before native Py_UNICODE support, it's unlikely that users willhave Py_UNICODE written in their code explicitly. So I can make the switchunder the hood.

Just to explain, a native CPython C type is much better than an arbitraryinteger type, because it allows Cython to apply specific coercion rulesfrom and to Python object types. As currently Py_UNICODE, Py_UCS4 wouldobviously coerce from and to a 1 character Unicode string, but it couldadditionally handle surrogate pair splitting and combining automatically oncurrent 16-bit Unicode builds so that you'd get a Unicode string with twocode points on coercion to Python.

While I'm somewhat confident that I'll
find a way to fix this in Cython, my point is just that this adds a
certain level of complexity to C code using the new memory layout that
simply wasn't there before.


Understood. However, I think it is easier than you think it is.


Let's see about the implications once there is an implementation.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

Reply via email to