Hi Inada-san, I am currently too busy with EuroPython to participate in longer discussions. FWIW: I intend to continue after EuroPython.
In any case, thanks for writing up the PEP. Could you please add my points about: - the fact that the encode APIs encoding from a Unicode buffer to a bytes object; this is an important fact, since the removal removes access to this codec functionality for extensions - PyUnicode_AsEncodedString() is not a proper alternative, since it requires to create a temporary PyUnicode object, which is inefficient and wastes memory - the maintenance effect mentioned in the PEP does not really materialize, since the underlying functionality still exists in the codecs - only access to the functionality is removed - keeping just the generic PyUnicode_Encode() API would be a compromise - if we remove the codec specific PyUnicode_Encode*() APIs, why are we still keeping the specisl PyUnicde_Decode*() APIs ? - the deprecations were just done because the Py_UNICODE data type was replaced by a hybrid type. Using this as an argument for removing functionality is not really good practice, when these are ways to continue exposing the functionality using other data types. I am still strongly -1 on removing all encoding APIs without at least some upgrade path for existing code to use and keeping the API symmetric. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ On 07.07.2020 17:17, Inada Naoki wrote: > Hi, folks. > > Since the previous discussion was suspended without consensus, I wrote > a new PEP for it. (Thank you Victor for reviewing it!) > > This PEP looks very similar to PEP 623 "Remove wstr from Unicode", > but for encoder APIs, not for Unicode object APIs. > > URL (not available yet): https://www.python.org/dev/peps/pep-0624/ > > --- > > PEP: 624 > Title: Remove Py_UNICODE encoder APIs > Author: Inada Naoki <songofaca...@gmail.com> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 06-Jul-2020 > Python-Version: 3.11 > > > Abstract > ======== > > This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in > Python 3.11: > > * ``PyUnicode_Encode()`` > * ``PyUnicode_EncodeASCII()`` > * ``PyUnicode_EncodeLatin1()`` > * ``PyUnicode_EncodeUTF7()`` > * ``PyUnicode_EncodeUTF8()`` > * ``PyUnicode_EncodeUTF16()`` > * ``PyUnicode_EncodeUTF32()`` > * ``PyUnicode_EncodeUnicodeEscape()`` > * ``PyUnicode_EncodeRawUnicodeEscape()`` > * ``PyUnicode_EncodeCharmap()`` > * ``PyUnicode_TranslateCharmap()`` > * ``PyUnicode_EncodeDecimal()`` > * ``PyUnicode_TransformDecimalToASCII()`` > > .. note:: > > `PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove > Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP > is not relating to Unicode object. These PEPs are split because they have > different motivation and need different discussion. > > > Motivation > ========== > > In general, reducing the number of APIs that have been deprecated for > a long time and have few users is a good idea for not only it > improves the maintainability of CPython, but it also helps API users > and other Python implementations. > > > Rationale > ========= > > Deprecated since Python 3.3 > --------------------------- > > ``Py_UNICODE`` and APIs using it are deprecated since Python 3.3. > > > Inefficient > ----------- > > All of these APIs are implemented using ``PyUnicode_FromWideChar``. > So these APIs are inefficient when user want to encode Unicode > object. > > > Not used widely > --------------- > > When searching from top 4000 PyPI packages [1]_, only pyodbc use > these APIs. > > * ``PyUnicode_EncodeUTF8()`` > * ``PyUnicode_EncodeUTF16()`` > > pyodbc uses these APIs to encode Unicode object into bytes object. > So it is easy to fix it. [2]_ > > > Alternative APIs > ================ > > There are alternative APIs to accept ``PyObject *unicode`` instead of > ``Py_UNICODE *``. Users can migrate to them. > > > ========================================= > ========================================== > Deprecated API Alternative APIs > ========================================= > ========================================== > ``PyUnicode_Encode()`` ``PyUnicode_AsEncodedString()`` > ``PyUnicode_EncodeASCII()`` ``PyUnicode_AsASCIIString()`` \(1) > ``PyUnicode_EncodeLatin1()`` ``PyUnicode_AsLatin1String()`` \(1) > ``PyUnicode_EncodeUTF7()`` \(2) > ``PyUnicode_EncodeUTF8()`` ``PyUnicode_AsUTF8String()`` \(1) > ``PyUnicode_EncodeUTF16()`` ``PyUnicode_AsUTF16String()`` \(3) > ``PyUnicode_EncodeUTF32()`` ``PyUnicode_AsUTF32String()`` \(3) > ``PyUnicode_EncodeUnicodeEscape()`` > ``PyUnicode_AsUnicodeEscapeString()`` > ``PyUnicode_EncodeRawUnicodeEscape()`` > ``PyUnicode_AsRawUnicodeEscapeString()`` > ``PyUnicode_EncodeCharmap()`` ``PyUnicode_AsCharmapString()`` \(1) > ``PyUnicode_TranslateCharmap()`` ``PyUnicode_Translate()`` > ``PyUnicode_EncodeDecimal()`` \(4) > ``PyUnicode_TransformDecimalToASCII()`` \(4) > ========================================= > ========================================== > > Notes: > > (1) > ``const char *errors`` parameter is missing. > > (2) > There is no public alternative API. But user can use generic > ``PyUnicode_AsEncodedString()`` instead. > > (3) > ``const char *errors, int byteorder`` parameters are missing. > > (4) > There is no direct replacement. But ``Py_UNICODE_TODECIMAL`` > can be used instead. CPython uses > ``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting > from Unicode to numbers instead. > > > Plan > ==== > > Python 3.9 > ---------- > > Add ``Py_DEPRECATED(3.3)`` to following APIs. This change is committed > already [3]_. All other APIs have been marked ``Py_DEPRECATED(3.3)`` > already. > > * ``PyUnicode_EncodeDecimal()`` > * ``PyUnicode_TransformDecimalToASCII()``. > > Document all APIs as "will be removed in version 3.11". > > > Python 3.11 > ----------- > > These APIs are removed. > > * ``PyUnicode_Encode()`` > * ``PyUnicode_EncodeASCII()`` > * ``PyUnicode_EncodeLatin1()`` > * ``PyUnicode_EncodeUTF7()`` > * ``PyUnicode_EncodeUTF8()`` > * ``PyUnicode_EncodeUTF16()`` > * ``PyUnicode_EncodeUTF32()`` > * ``PyUnicode_EncodeUnicodeEscape()`` > * ``PyUnicode_EncodeRawUnicodeEscape()`` > * ``PyUnicode_EncodeCharmap()`` > * ``PyUnicode_TranslateCharmap()`` > * ``PyUnicode_EncodeDecimal()`` > * ``PyUnicode_TransformDecimalToASCII()`` > > > Alternative ideas > ================= > > Instead of just removing deprecated APIs, we may be able to use thier > names with different signature. > > > Make some private APIs public > ------------------------------ > > ``PyUnicode_EncodeUTF7()`` doesn't have public alternative APIs. > > Some APIs have alternative public APIs. But they are missing > ``const char *errors`` or ``int byteorder`` parameters. > > We can rename some private APIs and make them public to cover missing > APIs and parameters. > > ============================= ================================ > Rename to Rename from > ============================= ================================ > ``PyUnicode_EncodeASCII()`` ``_PyUnicode_AsASCIIString()`` > ``PyUnicode_EncodeLatin1()`` ``_PyUnicode_AsLatin1String()`` > ``PyUnicode_EncodeUTF7()`` ``_PyUnicode_EncodeUTF7()`` > ``PyUnicode_EncodeUTF8()`` ``_PyUnicode_AsUTF8String()`` > ``PyUnicode_EncodeUTF16()`` ``_PyUnicode_EncodeUTF16()`` > ``PyUnicode_EncodeUTF32()`` ``_PyUnicode_EncodeUTF32()`` > ============================= ================================ > > Pros: > > * We have more consistent API set. > > Cons: > > * We have more public APIs to maintain. > * Existing public APIs are enough for most use cases, and > ``PyUnicode_AsEncodedString()`` can be used in other cases. > > > Replace ``Py_UNICODE*`` with ``Py_UCS4*`` > ----------------------------------------- > > We can replace ``Py_UNICODE`` (typedef of ``wchar_t``) with > ``Py_UCS4``. Since builtin codecs support UCS-4, we don't need to > convert ``Py_UCS4*`` string to Unicode object. > > > Pros: > > * We have more consistent API set. > * User can encode UCS-4 string in C without creating Unicode object. > > Cons: > > * We have more public APIs to maintain. > * Applications which uses UTF-8 or UTF-32 can not use these APIs > anyway. > * Other Python implementations may not have builtin codec for UCS-4. > * If we change the Unicode internal representation to UTF-8, we need > to keep UCS-4 support only for these APIs. > > > Replace ``Py_UNICODE*`` with ``wchar_t*`` > ----------------------------------------- > > We can replace ``Py_UNICODE`` to ``wchar_t``. > > Pros: > > * We have more consistent API set. > * Backward compatible. > > Cons: > > * We have more public APIs to maintain. > * They are inefficient on platforms ``wchar_t*`` is UTF-16. It is > because built-in codecs supports only UCS-1, UCS-2, and UCS-4 > input. > > > Rejected ideas > ============== > > Using runtime warning > --------------------- > > These APIs doesn't release GIL for now. Emitting a warning from > such APIs is not safe. See this example. > > .. code-block:: > > PyObject *u = PyList_GET_ITEM(list, i); // u is borrowed reference. > PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u), > PyUnicode_GET_SIZE(u), NULL); > // Assumes u is still living reference. > PyObject *t = PyTuple_Pack(2, u, b); > Py_DECREF(b); > return t; > > If we emit Python warning from ``PyUnicode_EncodeUTF8()``, warning > filters and other threads may change the ``list`` and ``u`` can be > a dangling reference after ``PyUnicode_EncodeUTF8()`` returned. > > Additionally, since we are not changing behavior but removing C APIs, > runtime ``DeprecationWarning`` might not helpful for Python > developers. We should warn to extension developers instead. > > > Discussions > =========== > > * `Plan to remove Py_UNICODE APis except PEP 623 > > <https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE/#S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE>`_ > * `bpo-41123: Remove Py_UNICODE APIs except PEP 623: > <https://bugs.python.org/issue41123>`_ > > > References > ========== > > .. [1] Source package list chosen from top 4000 PyPI packages. > > (https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.txt) > > .. [2] pyodbc -- Don't use PyUnicode_Encode API #792 > (https://github.com/mkleehammer/pyodbc/pull/792) > > .. [3] Uncomment Py_DEPRECATED for Py_UNICODE APIs (GH-21318) > > (https://github.com/python/cpython/commit/9c3840870814493fed62e140cfa43c2883e12181) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QT7QVAKF36Y2GOXNPXZ5AGKWGKZI3XT7/ Code of Conduct: http://python.org/psf/codeofconduct/