On Tue, Jun 23, 2020 at 6:58 AM Victor Stinner <vstin...@python.org> wrote: > > Hi INADA-san, > > First of all, thanks for writing down a PEP! > > Le jeu. 18 juin 2020 à 11:42, Inada Naoki <songofaca...@gmail.com> a écrit : > > To support legacy Unicode object created by > > ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has > > ``PyUnicode_READY()`` check. > > I don't see PyUnicode_READY() removal in the specification section. > When can we remove these calls and the function itself? >
Legacy unicode representation is using wstr so legacy unicode support is removed with wstr. PyUnicode_READY() will be no-op when wstr is removed. We can remove calling of PyUnicode_READY() since then. I think we can deprecate PyUnicode_READY() when wstr is removed. > > > Support of legacy Unicode object makes Unicode implementation complex. > > Until we drop legacy Unicode object, it is very hard to try other Unicode > > implementation like UTF-8 based implementation in PyPy. > > I'm not sure if it should be in the scope of the PEP or not, but there > are also other C API functions which are too close to the PEP 393 > concrete implementation. For example, I'm not sure that > PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python > str is reimplemented to use UTF-8 internally. Should we deprecate it > as well? Do you think that it should be addressed in a separated PEP? > I don't like optimizations which is heavily relying on CPython implementation. But I think it is too early to deprecate it. We should just recommend UTF-8 based approach. > In fact, a large part of the Unicode C API is based on the current > implementation of the Python str type. For example, I'm not sure that > PyUnicode_New(size, max_char) would still make sense if we change the > code to store strings as UTF-8 internally. > > In an ideal world, I would prefer to have a "string builder" API, like > the current _PyUnicodeWriter C API, to create a string, and only never > allow to modify a string in-place. I completely agree with you. But current _PyUnicodeWriter is tight coupled with PEP 393 and it is not UTF-8 based. I am not sure that we should make it public and stable from Python 3.10. I think we should recommend `PyUnicode_FromStringAndSize(utf8, utf8_len)` for now to avoid too tightly coupled with PEP 393. Regards, -- Inada Naoki <songofaca...@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3MT3ZHA66PW7K7OLZERTDLFQEDFPYHQI/ Code of Conduct: http://python.org/psf/codeofconduct/