Hi INADA-san, First of all, thanks for writing down a PEP!
Le jeu. 18 juin 2020 à 11:42, Inada Naoki <songofaca...@gmail.com> a écrit : > To support legacy Unicode object created by > ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has > ``PyUnicode_READY()`` check. I don't see PyUnicode_READY() removal in the specification section. When can we remove these calls and the function itself? > Support of legacy Unicode object makes Unicode implementation complex. > Until we drop legacy Unicode object, it is very hard to try other Unicode > implementation like UTF-8 based implementation in PyPy. I'm not sure if it should be in the scope of the PEP or not, but there are also other C API functions which are too close to the PEP 393 concrete implementation. For example, I'm not sure that PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python str is reimplemented to use UTF-8 internally. Should we deprecate it as well? Do you think that it should be addressed in a separated PEP? In fact, a large part of the Unicode C API is based on the current implementation of the Python str type. For example, I'm not sure that PyUnicode_New(size, max_char) would still make sense if we change the code to store strings as UTF-8 internally. In an ideal world, I would prefer to have a "string builder" API, like the current _PyUnicodeWriter C API, to create a string, and only never allow to modify a string in-place. CPython "almost" immutable str "if reference count is equal to 1" has corner cases and can be misused. But again, I don't think that it should be part of this PEP :-) Sorry for being off-topic ;-) > Specification > ============= > > Affected APIs > -------------- > > From the Unicode implementation, ``wstr`` and ``wstr_length`` members are > removed. > > Macros and functions to be removed: > > * PyUnicode_GET_SIZE > * PyUnicode_GET_DATA_SIZE > * Py_UNICODE_WSTR_LENGTH > * PyUnicode_AS_UNICODE > * PyUnicode_AS_DATA > * PyUnicode_AsUnicode > * PyUnicode_AsUnicodeAndSize Which ones are already deprecated? > Behaviors to be removed: > > * PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where > ``size > 0`` cause RuntimeError instead of creating legacy Unicode > object. While this API is deprecated by PEP 393, this API will be kept > when ``wstr`` is removed. This API will be removed later. I'm not sure that it's relevant to keep PyUnicode_FromUnicode() whereas PyUnicode_FromWideChar() has a clean API (use wchar_t*, not Py_UNICODE*). I also suggest to disallow PyUnicode_FromUnicode(NULL, 0) as well. By the way, when can we finally remove the Py_UNICODE type? I would prefer to remove Py_UNICODE and PyUnicode_FromUnicode(). > * PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode, > ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError > instead of creating legacy unicode object. > All APIs to be changed should raise DeprecationWarning for behavior to be > removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation > warning and runtime DeprecationWarning. [3]_, [4]_. Every function scheduled for removal? Even PyUnicode_GET_SIZE()? I'm not sure that C extensions are prepared for PyUnicode_GET_SIZE() raising an exception when using -Werror. > All deprecations will be implemented in Python 3.10. > Some deprecations will be backported in Python 3.9. > > Actual removal will happen in Python 3.12. Many functions are already declared with Py_DEPRECATED() for a long time. Would it make sense to remove these functions earlier? Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/M7JFI5TWLM7KOYVSBFFTPQS5HHO4DF2M/ Code of Conduct: http://python.org/psf/codeofconduct/