On 2020-12-27 19:15, Guido van Rossum wrote:
On Sun, Dec 27, 2020 at 3:19 AM Ronald Oussoren <ronaldousso...@mac.com <mailto:ronaldousso...@mac.com>> wrote:



    On 26 Dec 2020, at 18:43, Guido van Rossum <gu...@python.org
    <mailto:gu...@python.org>> wrote:

    On Sat, Dec 26, 2020 at 3:54 AM Phil Thompson via Python-Dev
    <python-dev@python.org <mailto:python-dev@python.org>> wrote:

        It's worth comparing the situation with byte arrays. There is
        no problem
        of translating different representations of an element, but
        there is
        still the issue of who owns the memory. The Python buffer
        protocol
        usually solves this problem, so something similar for unicode
        "arrays"
        might suffice.


    Exactly my thought on the matter. I have no doubt that between all
    of us we could design a decent protocol.

    The practical problem would be to convince enough people that this
    is worth doing to actually get the code changed (str being one of
    the most popular data types traveling across C API boundaries), in
    the CPython core (which surely has a lot of places to modify) as
    well as in the vast collection of affected 3rd party modules. Like
    many migrations it's an endless slog for the developers involved,
    and in open source it's hard to assign resources for such a project.

    That’s a problem indeed.  An 80% solution could be reached by
    teaching PyArg_Parse* about the new protocol, it already uses the
    buffer protocol for bytes-like objects and could be thought about a
    variant of the protocol for strings.  That would require that the
    implementation of that new variant returns a pointer in the Py_view
    that can used after the view is released, but that’s already a
    restriction for the use of new style buffers in the PyArg_Parse* APIs.

    That wouldn’t be a solution for code using the PyUnicode_* APIs of
    course, nor Python code explicitly checking for the str type.

    In the end a new string “kind” (next to the 1, 2 and 4 byte
    variants) where callbacks are used to provide data might be the most
    pragmatic.  That will still break code peaking directly in the the
    PyUnicodeObject struct, but anyone doing that should know that that
    is not a stable API.


That's an attractive idea. I've personally never had to peek inside the implementation, and I suspect there's not that much code that does so (even in the CPython code base itself, outside the PyUnicode implementation of course).

The re module does it extensively for speed reasons.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4LUBFUFQL6TNIX6CGTTF3O5M6IFXOME3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to