"Martin v. Löwis", 26.08.2011 18:56:
I agree with your observation that somebody should be done about error
handling, and will update the PEP shortly. I propose that
PyUnicode_Ready should be explicitly called on input where raising an
exception is feasible. In contexts where it is not feasible (such
as reading a character, or reading the length or the kind), failing to
ready the string should cause a fatal error.

I consider this an increase in complexity. It will then no longer be enough to access the data, the user will first have to figure out a suitable place in the code to make sure it's actually there, potentially forgetting about it because it works in all test cases, or potentially triggering a huge amount of overhead that copies and 'recodes' the string data by executing one of the macros that does it automatically.

For the specific case of Cython, I would guess that I could just add another special case that reads the data from the Py_UNICODE buffer and combines surrogates at need, but that will only work in some cases (specifically not for indexing). And outside of Cython, most normal user code won't do that.

My gut feeling leans towards a KISS approach. If you go the route to require an explicit point for triggering PyUnicode_Ready() calls, why not just go all the way and make it completely explicit in *all* cases? I.e. remove all implicit calls from the macros and make it part of the new API semantics that users *must* call PyUnicode_FAST_READY() before doing anything with a new string data layout. Much fewer surprises.

Note that there isn't currently an official macro way to figure out that the flexible string layout has not been initialised yet, i.e. that wstr is set but str is not. If the implicit PyUnicode_Ready() calls get removed, PyUnicode_KIND() could take that place by simply returning WSTR_KIND.

That being said, the main problem I currently see is that basically all existing code needs to be updated in order to handle these errors. Otherwise, it would be possible to trigger crashes by properly forging a string and passing it into an unprepared C library to let it run into a NULL pointer return value of PyUnicode_AS_UNICODE().

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to