[Python-Dev] Re: Draft PEP: Remove wstr from Unicode

Victor Stinner Mon, 22 Jun 2020 15:15:24 -0700

Hi INADA-san,

First of all, thanks for writing down a PEP!


Le jeu. 18 juin 2020 à 11:42, Inada Naoki <songofaca...@gmail.com> a écrit :
> To support legacy Unicode object created by
> ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
> ``PyUnicode_READY()`` check.

I don't see PyUnicode_READY() removal in the specification section.
When can we remove these calls and the function itself?


> Support of legacy Unicode object makes Unicode implementation complex.
> Until we drop legacy Unicode object, it is very hard to try other Unicode
> implementation like UTF-8 based implementation in PyPy.

I'm not sure if it should be in the scope of the PEP or not, but there
are also other C API functions which are too close to the PEP 393
concrete implementation. For example, I'm not sure that
PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python
str is reimplemented to use UTF-8 internally. Should we deprecate it
as well? Do you think that it should be addressed in a separated PEP?

In fact, a large part of the Unicode C API is based on the current
implementation of the Python str type. For example, I'm not sure that
PyUnicode_New(size, max_char) would still make sense if we change the
code to store strings as UTF-8 internally.

In an ideal world, I would prefer to have a "string builder" API, like
the current _PyUnicodeWriter C API, to create a string, and only never
allow to modify a string in-place.

CPython "almost" immutable str "if reference count is equal to 1" has
corner cases and can be misused. But again, I don't think that it
should be part of this PEP :-) Sorry for being off-topic ;-)

> Specification
> =============
>
> Affected APIs
> --------------
>
> From the Unicode implementation, ``wstr`` and ``wstr_length`` members are
> removed.
>
> Macros and functions to be removed:
>
> * PyUnicode_GET_SIZE
> * PyUnicode_GET_DATA_SIZE
> * Py_UNICODE_WSTR_LENGTH
> * PyUnicode_AS_UNICODE
> * PyUnicode_AS_DATA
> * PyUnicode_AsUnicode
> * PyUnicode_AsUnicodeAndSize

Which ones are already deprecated?

> Behaviors to be removed:
>
> * PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where
>   ``size > 0`` cause RuntimeError instead of creating legacy Unicode
>   object. While this API is deprecated by PEP 393, this API will be kept
>   when ``wstr`` is removed. This API will be removed later.

I'm not sure that it's relevant to keep PyUnicode_FromUnicode()
whereas PyUnicode_FromWideChar() has a clean API (use wchar_t*, not
Py_UNICODE*). I also suggest to disallow PyUnicode_FromUnicode(NULL,
0) as well.

By the way, when can we finally remove the Py_UNICODE type?

I would prefer to remove Py_UNICODE and PyUnicode_FromUnicode().


> * PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode,
>   ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError
>   instead of creating legacy unicode object.



> All APIs to be changed should raise DeprecationWarning for behavior to be
> removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation
> warning and runtime DeprecationWarning. [3]_, [4]_.

Every function scheduled for removal? Even PyUnicode_GET_SIZE()? I'm
not sure that C extensions are prepared for PyUnicode_GET_SIZE()
raising an exception when using -Werror.


> All deprecations will be implemented in Python 3.10.
> Some deprecations will be backported in Python 3.9.
>
> Actual removal will happen in Python 3.12.

Many functions are already declared with Py_DEPRECATED() for a long
time. Would it make sense to remove these functions earlier?


Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/M7JFI5TWLM7KOYVSBFFTPQS5HHO4DF2M/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Draft PEP: Remove wstr from Unicode

Reply via email to