On 2020-12-27 19:15, Guido van Rossum wrote:
On Sun, Dec 27, 2020 at 3:19 AM Ronald Oussoren <ronaldousso...@mac.com
<mailto:ronaldousso...@mac.com>> wrote:
On 26 Dec 2020, at 18:43, Guido van Rossum <gu...@python.org
<mailto:gu...@python.org>> wrote:
On Sat, Dec 26, 2020 at 3:54 AM Phil Thompson via Python-Dev
<python-dev@python.org <mailto:python-dev@python.org>> wrote:
It's worth comparing the situation with byte arrays. There is
no problem
of translating different representations of an element, but
there is
still the issue of who owns the memory. The Python buffer
protocol
usually solves this problem, so something similar for unicode
"arrays"
might suffice.
Exactly my thought on the matter. I have no doubt that between all
of us we could design a decent protocol.
The practical problem would be to convince enough people that this
is worth doing to actually get the code changed (str being one of
the most popular data types traveling across C API boundaries), in
the CPython core (which surely has a lot of places to modify) as
well as in the vast collection of affected 3rd party modules. Like
many migrations it's an endless slog for the developers involved,
and in open source it's hard to assign resources for such a project.
That’s a problem indeed. An 80% solution could be reached by
teaching PyArg_Parse* about the new protocol, it already uses the
buffer protocol for bytes-like objects and could be thought about a
variant of the protocol for strings. That would require that the
implementation of that new variant returns a pointer in the Py_view
that can used after the view is released, but that’s already a
restriction for the use of new style buffers in the PyArg_Parse* APIs.
That wouldn’t be a solution for code using the PyUnicode_* APIs of
course, nor Python code explicitly checking for the str type.
In the end a new string “kind” (next to the 1, 2 and 4 byte
variants) where callbacks are used to provide data might be the most
pragmatic. That will still break code peaking directly in the the
PyUnicodeObject struct, but anyone doing that should know that that
is not a stable API.
That's an attractive idea. I've personally never had to peek inside the
implementation, and I suspect there's not that much code that does so
(even in the CPython code base itself, outside the PyUnicode
implementation of course).
The re module does it extensively for speed reasons.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/4LUBFUFQL6TNIX6CGTTF3O5M6IFXOME3/
Code of Conduct: http://python.org/psf/codeofconduct/