On Thu, Jul 9, 2020 at 5:46 AM M.-A. Lemburg <m...@egenix.com> wrote:
> - the fact that the encode APIs encoding from a Unicode buffer
>   to a bytes object; this is an important fact, since the removal
>   removes access to this codec functionality for extensions
>
> - PyUnicode_AsEncodedString() is not a proper alternative, since
>   it requires to create a temporary PyUnicode object, which is
>   inefficient and wastes memory

I wrote your points in the "Alternative Idea > Replace Py_UNICODE*
with Py_UCS4* "
section. I wrote "User can encode UCS-4 string in C without creating
Unicode object." in it.

https://www.python.org/dev/peps/pep-0624/#replace-py-unicode-with-py-ucs4

Note that the current Py_UNICODE* encoder APIs create temporary
PyUnicode objects.
They are inefficient and wastes memory now. Py_UNICODE* may be UTF-16 on some
platforms (e.g. Windows) and builtin codecs don't support UTF-16 input.


>
> - the maintenance effect mentioned in the PEP does not really
>   materialize, since the underlying functionality still exists
>   in the codecs - only access to the functionality is removed
>

In the same section, I described the maintenance cost as below.

* Other Python implementations may not have builtin codec for UCS-4.
* If we change the Unicode internal representation to UTF-8, we need
to keep UCS-4 support only for these APIs.

> - keeping just the generic PyUnicode_Encode() API would be a
>   compromise
>
> - if we remove the codec specific PyUnicode_Encode*() APIs, why
>   are we still keeping the specisl PyUnicde_Decode*() APIs ?
>

OK, I will add "Discussions" section. (I don't like "FAQ" because some question
are important even if it is not "frequently" asked.)

Quick answer is:

* They are stable ABI. (Py_UNICODE is excluded from stable ABI).
* Decoding from char* is more common and generic use case than encoding from
  Py_UNICODE*.
* Other Python implementations using UTF-8 as internal representation
can implement
  it easily.

But I'm not opposite to remove it (especially for minor UTF-7 codec).
It is just out of scope of this PEP.


> - the deprecations were just done because the Py_UNICODE data
>   type was replaced by a hybrid type. Using this as an argument
>   for removing functionality is not really good practice, when
>   these are ways to continue exposing the functionality using other
>   data types.

I hope the "Replace Py_UNICODE* with Py_UCS4* " section describe this.

Regards,

-- 
Inada Naoki  <songofaca...@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N4F5WLSNYUWQO4FEPIOOUCHG4ZFLQVLI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to