[Python-Dev] Re: PEP: Modify the C API to hide implementation details

Antoine Pitrou Fri, 10 Apr 2020 12:58:45 -0700

On Fri, 10 Apr 2020 19:20:00 +0200
Victor Stinner <vstin...@python.org> wrote:
> 
> Note: Cython and cffi should be preferred to write new C extensions.
> This PEP is about existing C extensions which cannot be rewritten with
> Cython.


Using Cython does not make the C API irrelevant.  In some
applications, the C API has to be low-level enough for performance.
Whether the application is written in Cython or not.

> **Status:** not started. The performance overhead must be measured with
> benchmarks and this PEP should be accepted.

Surely you mean "before this PEP should be accepted"?

> Examples of issues to make structures opaque:
> 
> * ``PyGC_Head``: https://bugs.python.org/issue40241
> * ``PyObject``: https://bugs.python.org/issue39573
> * ``PyTypeObject``: https://bugs.python.org/issue40170

How do you keep fast type checking such as PyTuple_Check() if extension
code doesn't have access e.g. to tp_flags?

I notice you did:
"""
Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC()
for object.c and typeobject.c.
"""

So you understand there is a need.

> **Backward compatibility:** backward incompatible on purpose. Break the
> limited C API and the stable ABI, with the assumption that `Most C
> extensions don't rely directly on CPython internals`_ and so will remain
> compatible.

The problem here is not only compatibility but potential performance
regressions in C extensions.

> New optimized CPython runtime
> ==============================
> 
> Backward incompatible changes is such a pain for the whole Python
> community. To ease the migration (accelerate adoption of the new C
> API), one option is to provide not only one but two CPython runtimes:
> 
> * Regular CPython: fully backward compatible, support direct access to
>   structures like ``PyObject``, etc.
> * New optimized CPython: incompatible, cannot import C extensions which
>   don't use the limited C API, has new optimizations, limited to the C
>   API.

Well, this sounds like a distribution nightmare.  Some packages will
only be available for one runtime and not the other.  It will confuse
non-expert users.

> O(1) bytearray to bytes conversion
> ..................................
> 
> Convert bytearray to bytes without memory copy.
> 
> Currently, bytearray is used to build a bytes string, but it's usually
> converted into a bytes object to respect an API. This conversion
> requires to allocate a new memory block and copy data (O(n) complexity).
> 
> It is possible to implement O(1) conversion if it would be possible to
> pass the ownership of the bytearray object to bytes.
> 
> That requires modifying the ``PyBytesObject`` structure to support
> multiple storages (support storing content into a separate memory
> block).

If that's desirable (I'm not sure it is), there is a simpler solution:
instead of allocating a raw memory area, bytearray could allocate... a
private bytes object that you can detach without copying it.

But really, this is why we have BytesIO.  Which already uses that exact
strategy: allocate a private bytes object.

> Fork and "Copy-on-Read" problem
> ...............................
> 
> Solve the "Copy on read" problem with fork: store reference counter
> outside ``PyObject``.

Nowadays it is strongly recommended to use multiprocessing with the
"forkserver" start method:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

With "forkserver", the forked process is extremely lightweight and
there are little savings to be made in the child.

> `Dismissing Python Garbage Collection at Instagram
> <https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172>`_
> (Jan 2017) by Instagram Engineering.
> 
> Instagram contributed `gc.freeze()
> <https://docs.python.org/dev/library/gc.html#gc.freeze>`_ to Python 3.7
> which works around the issue.
> 
> One solution for that would be to store reference counters outside
> ``PyObject``. For example, in a separated hash table (pointer to
> reference counter). Changing ``PyObject`` structures requires that C
> extensions don't access them directly.

You're planning to introduce a large overhead for each reference
count lookup just to satisfy a rather niche use case?  CPython
probably does millions of reference counts per second.

> Debug runtime and remove debug checks in release mode
> .....................................................
> 
> If the C extensions are no longer tied to CPython internals, it becomes
> possible to switch to a Python runtime built in debug mode to enable
> runtime debug checks to ease debugging C extensions.

That's the one convincing feature in this PEP, as far as I'm concerned.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HKRZQVVXOFLS36VYV6NRQJOLEYSKJ2NQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP: Modify the C API to hide implementation details

Reply via email to