Re: [Python-Dev] PEP 393 review

Stefan Behnel Thu, 25 Aug 2011 14:32:26 -0700

Stefan Behnel, 25.08.2011 20:47:

"Martin v. Löwis", 24.08.2011 20:15:

- issues to be considered (unclarities, bugs, limitations, ...)


A problem of the current implementation is the need for calling
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
insufficient memory). Basically, this means that even something as trivial
as trying to get the length of a Unicode string can now result in an error.

Oh, and the same applies to PyUnicode_AS_UNICODE() now. I doubt that thereis *any* code out there that expects this macro to ever return NULL. Thismeans that the current implementation has actually broken the old API. Justallocate an "80% of your memory" long string using the new API and thencall PyUnicode_AS_UNICODE() on it to see what I mean.

Sadly, a quick look at a couple of recent commits in the pep-393 branchsuggested that it is not even always obvious to you as the authors whichmacros can be called safely and which cannot. I immediately spotted a bugin one of the updated core functions (unicode_repr, IIRC) wherePyUnicode_GET_LENGTH() is called without a previous call toPyUnicode_FAST_READY().

I find it everything but obvious that calling PyUnicode_DATA() andPyUnicode_KIND() is safe as long as the return value is being checked forerrors, but calling PyUnicode_GET_LENGTH() is not safe unless there was aprevious call to PyUnicode_Ready().

I just noticed this when rewriting Cython's helper function that searches a
unicode string for a (Py_UCS4) character. Previously, the entire function
was safe, could never produce an error and therefore always returned a
boolean result. In the new world, the caller of this function must check
and propagate errors. This may not be a major issue in most cases, but it
can have a non-trivial impact on user code, depending on how deep in a call
chain this happens and on how much control the user has over the call chain
(think of a C callback, for example).

Also, even in the case that there is no error, the potential need to build
up the string on request means that the run time and memory requirements of
an algorithm are less predictable now as they depend on the origin of the
input and not just its Python level string content.

I would be happier with an implementation that avoided this by always
instantiating the data buffer right from the start, instead of carrying
only a Py_UNICODE buffer for old-style instances.


Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 review

Reply via email to