On Tue, Jun 23, 2020 at 6:58 AM Victor Stinner <vstin...@python.org> wrote:
>
> Hi INADA-san,
>
> First of all, thanks for writing down a PEP!
>
> Le jeu. 18 juin 2020 à 11:42, Inada Naoki <songofaca...@gmail.com> a écrit :
> > To support legacy Unicode object created by
> > ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
> > ``PyUnicode_READY()`` check.
>
> I don't see PyUnicode_READY() removal in the specification section.
> When can we remove these calls and the function itself?
>

Legacy unicode representation is using wstr so legacy unicode support
is removed with wstr.
PyUnicode_READY() will be no-op when wstr is removed.  We can remove
calling of PyUnicode_READY() since then.

I think we can deprecate PyUnicode_READY() when wstr is removed.

>
> > Support of legacy Unicode object makes Unicode implementation complex.
> > Until we drop legacy Unicode object, it is very hard to try other Unicode
> > implementation like UTF-8 based implementation in PyPy.
>
> I'm not sure if it should be in the scope of the PEP or not, but there
> are also other C API functions which are too close to the PEP 393
> concrete implementation. For example, I'm not sure that
> PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python
> str is reimplemented to use UTF-8 internally. Should we deprecate it
> as well? Do you think that it should be addressed in a separated PEP?
>

I don't like optimizations which is heavily relying on CPython
implementation. But I think it is too early to deprecate it.
We should just recommend UTF-8 based approach.


> In fact, a large part of the Unicode C API is based on the current
> implementation of the Python str type. For example, I'm not sure that
> PyUnicode_New(size, max_char) would still make sense if we change the
> code to store strings as UTF-8 internally.
>
> In an ideal world, I would prefer to have a "string builder" API, like
> the current _PyUnicodeWriter C API, to create a string, and only never
> allow to modify a string in-place.

I completely agree with you.  But current _PyUnicodeWriter is tight
coupled with PEP 393 and it is not UTF-8 based.  I am not sure that
we should make it public and stable from Python 3.10.

I think we should recommend `PyUnicode_FromStringAndSize(utf8, utf8_len)`
for now to avoid too tightly coupled with PEP 393.

Regards,

-- 
Inada Naoki  <songofaca...@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3MT3ZHA66PW7K7OLZERTDLFQEDFPYHQI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to