2018-01-26 14:43 GMT+01:00 M.-A. Lemburg <m...@egenix.com>: > If that's indeed being used as assumption, the docs must be > fixed and PyUnicode_New() should verify this assumption as > well - not only in debug builds using C asserts() :-)
As PyUnicode_FromStringAndSize(NULL, size), PyUnicode_New(size, maxchar) only allocates memory with uninitialized characters. I don't see how PyUnicode_New() could check the string content since the content is unknow yet... The new public C API added by PEP 393 is hard to use correctly, but they are the most efficient. Functions like PyUnicode_FromString() are simple to use and very hard to misuse :-) PyPy developers asked me to simply drop all these new public C API, make them private. At least, deprecate them. But I never looked in depth at the new API. I don't know if Cython uses it for example. Some APIs are still private like _PyUnicodeWriter which allows to create a string in multiple steps with a smart strategy to reduce or even avoid realloc() and conversions from the different storage types (UCS1, UCS2, UCS4). This API is very efficient, but also hard to use. > C extensions can easily create strings using PyUnicode_New() > which do not adhere to such a requirement and then write > arbitrary content using PyUnicode_WRITE(). In some cases, > this may even be necessary, say in case the extension doesn't > know what data is being written, reading it from some external > source. It would be a bug in the C extension. > I'm not too familiar with the new Unicode code, but it seems > that this requirement is not checked everywhere, e.g. the > resize code doesn't seem to have such checks either (only in > debug versions). It must be checked everywhere. If it's not the case, it's an obvious bug in CPython. If you spotted a bug, please report a bug ;-) Victor _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/