Hi, Here is a first draft a PEP which summarize the research work I'm doing on CPython C API since 2017 and the changes that me and others already made since Python 3.7 towards an "opaque" C API. The PEP is also a collaboration with developers of PyPy, HPy, Rust-CPython and many others! Thanks to everyone who helped me to write it down!
Maybe this big document should be reorganized as multiple smaller better defined goals: as multiple PEPs. The PEP is quite long and talks about things which are not directly related. It's a complex topic and I chose to put everything as a single document to have a good starting point to open the discussion. I already proposed some of these ideas in 2017: see the Prior Art section ;-) The PEP can be read on GitHub where it's better formatted: https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst If someone wants to work on the PEP itself, the document on GitHub is the current reference. Victor ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PEP xxx: Modify the C API to hide implementation details ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Abstract ======== * Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes. * `Stable ABI`_: Only build a C extension once and use it on multiple Python runtimes and different versions of the same runtime. * Better advertise alternative Python runtimes and better communicate on the differences between the Python language and the Python implementation (especially CPython). Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython. Rationale ========= To remain competitive in term of performance with other programming languages like Go or Rust, Python has to become more efficient. Make Python (at least) two times faster --------------------------------------- The C API leaks too many implementation details which prevent optimizing CPython. See `Optimize CPython`_. PyPy's support for Python's C API (pycext) is slow because it has to emulate CPython internals like memory layout and reference counting. The emulation causes memory overhead, memory copies, conversions, etc. See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni. While this PEP may make CPython a little bit slower in the short term, the long-term goal is to make "Python" at least two times faster. This goal is not hypothetical: PyPy is already 4.2x faster than CPython and is fully compatible. C extensions are the bottleneck of PyPy. This PEP proposes a migration plan to move towards opaque C API which would make PyPy faster. Separated the Python language and the CPython runtime (promote alternative runtimes) ------------------------------------------------------------------------------------ The Python language should be better separated from its runtime. It's common to say "Python" when referring to "CPython". Even in this PEP :-) Because the CPython runtime remains the reference implementation, many people believe that the Python language itself has design flaws which prevent it from being efficient. PyPy proved that this is a false assumption: on average, PyPy runs Python code 4.2 times faster than CPython. One solution for separating the language from the implementation is to promote the usage of alternative runtimes: not only provide the regular CPython, but also PyPy, optimized CPython which is only compatible with C extensions using the limited C API, CPython compiled in debug mode to ease debugging issues in C extensions, RustPython, etc. To make alternative runtimes viable, they should be competitive in term of features and performance. Currently, C extension modules remain the bottleneck for PyPy. Most C extensions don't rely directly on CPython internals ---------------------------------------------------------- While the C API is still tidely coupled to CPython internals, in practical, most C extensions don't rely directly on CPython internals. The expectation is that these C extensions will remain compatible with an "opaque" C API and only a minority of C extensions will have to be modified. Moreover, more and more C extensions are implemented in Cython or cffi. Updating Cython and cffi to be compatible with the opaque C API will make all these C extensions without having to modify the source code of each extension. Stable ABI ---------- The idea is to build a C extension only once: the built binary will be usable on multiple Python runtimes and different versions of the same runtime (stable ABI). The idea is not new but is an extension of the `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`__ implemented in CPython 3.4 with its "limited C API". The limited API is not used by default and is not widely used: PyQt is one of the only few known users. The idea here is that the default C API becomes the limited C API and so all C extensions will benefit of advantages of a stable ABI. Flaws of the C API ================== Borrowed references ------------------- A borrowed reference is a pointer which doesn't “hold” a reference. If the object is destroyed, the borrowed reference becomes a dangling pointer, pointing to freed memory which might be reused by a new object. Borrowed references can lead to bugs and crashes when misused. An example of a CPython bug caused by this is `bpo-25750: crash in type_getattro() <https://bugs.python.org/issue25750>`_. Borrowed references are a problem whenever there is no reference to borrow: they assume that a referenced object already exists (and thus has a positive reference count). Tagged pointers are an example of this problem: since there is no concrete ``PyObject*`` to represent the integer, it cannot easily be manipulated. This issue complicates optimizations like PyPy's list strategies: if a list contains only small integers, it is stored as a compact C array of longs. The equivalent of ``PyObject`` is only created when an item is accessed. (Most of the time the object is optimized away by the JIT, but this is another story.) This makes it hard to support the C API function ``PyList_GetItem()``, which should return a reference borrowed from the list, but the list contains no concrete ``PyObject`` that it could lend a reference to! PyPy's current solution is very bad: the first time ``PyList_GetItem()`` is called, the whole list is de-optimized (converted to a list of ``PyObject*``). See ``cpyext`` ``get_list_storage()``. See also the Specialized list use case, which is the same optimization applied to CPython. Like in PyPy, this optimization is incompatible with borrowed references since the runtime cannot guess when the temporary object should be destroyed. If ``PyList_GetItem()`` returned a strong reference, the ``PyObject*`` could just be allocated on the fly and destroyed when the user decrements its reference count. Basically, by putting borrowed references in the API, we are making it impossible to change the underlying data structure. Functions stealing strong references ------------------------------------ There are functions which steal strong references, for example ``PyModule_AddObject()`` and ``PySet_Discard()``. Stealing references is an issue similar to borrowed references. PyObject** ---------- Some functions of the C API return a pointer to an array of ``PyObject*``: * ``PySequence_Fast_ITEMS()`` * ``PyTuple_GET_ITEM()`` is sometimes abused to get an array of all of the tuple's contents: ``PyObject **items = &PyTuple_GET_ITEM(0);`` In effect, these functions return an array of borrowed references: like with ``PyList_GetItem()``, all callers of ``PySequence_Fast_ITEMS()`` assume the sequence holds references to its elements. Leaking structure members ------------------------- ``PyObject``, ``PyTypeObject``, ``PyThreadState``, etc. structures are currently public: C extensions can directly read and modify the structure members. For example, the ``Py_INCREF()`` macro directly increases ``PyObject.ob_refcnt``, without any abstraction. Hopefully, ``Py_INCREF()`` implementation can be modified without affecting the API. Change the C API ================ This PEP doesn't define an exhaustive list of all C API changes, but define some guidelines of bad patterns which should be avoided in the C API to prevent leaking implementation details. Separate header files of limited and internal C API --------------------------------------------------- In Python 3.6, all headers (.h files) were directly in the ``Include/`` directory. In Python 3.7, work started to move the internal C API into a new subdirectory, ``Include/internal/``. The work continued in Python 3.8 and 3.9. The internal C API is only partially exported: some functions are only declared with ``extern`` and so cannot be used outside CPython (with compilers supporting ``-fvisibility=hidden``, see above), whereas some functions are exported with ``PyAPI_FUNC()`` to make them usable in C extensions. Debuggers and profilers are typical users of the internal C API to inspect Python internals without calling functions (to inspect a coredump for example). Python 3.9 is now built with ``-fvisibility=hidden`` (supported by GCC and clang): symbols which are not declared with ``PyAPI_FUNC()`` or ``PyAPI_DATA()`` are no longer exported by the dynamical library (libpython). Another change is to separate the limited C API from the "CPython" C API: Python 3.8 has a new ``Include/cpython/`` sub-directory. It should not be used directly, but it is used automatically from the public headers when the ``Py_LIMITED_API`` macro is not defined. **Backward compatibility:** fully backward compatible. **Status:** basically completed in Python 3.9. Changes without API changes and with minor performance overhead --------------------------------------------------------------- * Replace macros with static inline functions. Work started in 3.8 and made good progress in Python 3.9. * Modify macros to avoid directly accessing structures fields. For example, the `Hide implementation detail of trashcan macros <https://github.com/python/cpython/commit/38965ec5411da60d312b59be281f3510d58e0cf1>`_ commit modifies ``Py_TRASHCAN_BEGIN_CONDITION()`` macro to call a new ``_PyTrash_begin()`` function rather than accessing directly ``PyThreadState.trash_delete_nesting`` field. **Backward compatibility:** fully backward compatible. **Status:** good progress in Python 3.9. Changes without API changes but with performance overhead --------------------------------------------------------- Replace macros or inline functions with regular functions. Work started in 3.9 on a limited set of functions. Converting macros to function calls can have a small overhead on performances. For example, ``Py_INCREF()`` macro modifies directly ``PyObject.ob_refcnt``: this macro would become an alias to the opaque ``Py_IncRef()`` function. It is possible that the regular CPython runtime keeps the ``Py_INCREF()`` macro which modifies directly ``PyObject.ob_refcnt`` to avoid any performance overhead. A tradeoff should be defined to limit differences between the regular and the new optimized CPython runtimes, without hurting too much performances of the regular CPython runtime. **Backward compatibility:** fully backward compatible. **Status:** not started. The performance overhead must be measured with benchmarks and this PEP should be accepted. API and ABI incompatible changes -------------------------------- * Make structures opaque: move them to the internal C API. * Remove functions from the public C API which are tied to CPython internals. Maybe begin by marking these functions as private (rename ``PyXXX`` to ``_PyXXX``) or move them to the internal C API. * Ban statically allocated types (by making ``PyTypeObject`` opaque): enforce usage of ``PyType_FromSpec()``. Examples of issues to make structures opaque: * ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170 * ``PyThreadState``: https://bugs.python.org/issue39573 Another example are ``Py_REFCNT()`` and ``Py_TYPE()`` macros which can currently be used l-value to modify an object reference count or type. Python 3.9 has new ``Py_SET_REFCNT()`` and ``Py_SET_TYPE()`` macros which should be used instead. ``Py_REFCNT()`` and ``Py_TYPE()`` macros should be converted to static inline functions to prevent their usage as l-value. **Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible. CPython specific behavior ========================= Some C functions and some Python functions have a behavior which is closely tied to the current CPython implementation. is operator ----------- The "x is y" operator is closed tied to how CPython allocates objects and to ``PyObject*``. For example, CPython uses singletons for numbers in [-5; 256] range:: >>> x=1; (x + 1) is 2 True >>> x=1000; (x + 1) is 1001 False Python 3.8 compiler now emits a ``SyntaxWarning`` when the right operand of the ``is`` and ``is not`` operators is a literal (ex: integer or string), but don't warn if it is ``None``, ``True``, ``False`` or ``Ellipsis`` singleton (`bpo-34850 <https://bugs.python.org/issue34850>`_). Example:: >>> x=1 >>> x is 1 <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? True CPython PyObject_RichCompareBool -------------------------------- CPython considers that two objects are identical if their memory address are equal: ``x is y`` in Python (``IS_OP`` opcode) is implemented internally in C as ``left == right`` where ``left`` and ``right`` are ``PyObject *`` pointers. The main function to implement comparison in CPython is ``PyObject_RichCompareBool()``. This function considers that two objects are equal if the two ``PyObject*`` pointers are equal (if the two objects are "identical"). For example, ``PyObject_RichCompareBool(obj1, obj2, Py_EQ)`` doesn't call ``obj1.__eq__(obj2)`` if ``obj1 == obj2`` where ``obj1`` and ``obj2`` are ``PyObject*`` pointers. This behavior is an optimization to make Python more efficient. For example, the ``dict`` lookup avoids ``__eq__()`` if two pointers are equal. Another example are Not-a-Number (NaN) floating pointer numbers which are not equal to themselves:: >>> nan = float("nan") >>> nan is nan True >>> nan == nan False The ``list.__contains__(obj)`` and ``list.index(obj)`` methods are implemented with ``PyObject_RichCompareBool()`` and so rely on objects identity:: >>> lst = [9, 7, nan] >>> nan in lst True >>> lst.index(nan) 2 >>> lst[2] == nan False In CPython, ``x == y`` is implemented with ``PyObject_RichCompare()`` which don't make the assumption that identical objects are equal. That's why ``nan == nan`` or ``lst[2] == nan`` return ``False``. Issues for other Python implementations --------------------------------------- The Python language doesn't require to be implemented with ``PyObject`` structure and use ``PyObject*`` pointers. PyPy doesn't use ``PyObject`` nor ``PyObject*``. If CPython is modified to use `Tagged Pointers`_, CPython would have the same issue. Alternative Python implementations have to mimick CPython to reduce incompatibilities. For example, PyPy mimicks CPython behavior for the ``is`` operator with CPython small integer singletons:: >>>> x=1; (x + 1) is 2 True It also mimicks CPython ``PyObject_RichCompareBool()``. Example with the Not-a-Number (NaN) float:: >>>> nan=float("nan") >>>> nan == nan False >>>> lst = [9, 7, nan] >>>> nan in lst True >>>> lst.index(nan) 2 >>>> lst[2] == nan False Better advertise alternative Python runtimes ============================================ Currently, PyPy and other "alternative" Python runtimes are not well advertised on the `Python website <https://www.python.org/>`_. They are only listed as the last choice in the Download menu. Once enough C extensions will be compatible with the limited C API, PyPy and other Python runtimes should be better advertised on the Python website and in the Python documentation, to no longer introduce them as as first-class citizen. Obviously, CPython is likely to remain the most feature-complete implementation in mid-term, since new PEPs are first implemented in CPython. Limitations can be simply documented, and users should be free to make their own choice, depending on their use cases. HPy project =========== The `HPy project <https://github.com/pyhandle/hpy>`__ is a brand new C API written from scratch. It is designed to ease migration from the current C API and to be efficient on PyPy. HPy hides all implementation details: it is based on "handles" so objects cannot be inspected with direct memory access: only opaque function calls are allowed. This abstraction has many benefits: * No more ``PyObject`` emulation needed: smaller memory footprint in PyPy cpyext, no more expensive conversions. * It is possible to have multiple handles pointing to the same object. It helps to better track the object lifetime and makes the PyPy implementation easier. PyPy doesn't use reference counting but a tracing garbage collector. When the PyPy GC moves objects in memory, handles don't change! HPy uses an array mapping handle to objects: only this array has to be updated. It is way more efficient. * The Python runtime is free to modify deep internals compared to CPython. Many optimizations become possible: see `Optimize CPython`_ section. * It is easy to add a debug wrapper to add checks before and after the function calls. For example, ensure that that GIL is held when calling CPython. HPy is developed outside CPython, is implemented on top of the existing Python C API, and so can support old Python versions. By default, binaries compiled in "universal" HPy ABI mode can be used on CPython and PyPy. HPy can also target CPython ABI which has the same performance than native C extensions. See HPy documentation of `Target ABIs documentation <https://github.com/pyhandle/hpy/blob/feature/improve-docs/docs/overview.rst#target-abis>`_. The PEP moves the C API towards HPy design and API. New optimized CPython runtime ============================== Backward incompatible changes is such a pain for the whole Python community. To ease the migration (accelerate adoption of the new C API), one option is to provide not only one but two CPython runtimes: * Regular CPython: fully backward compatible, support direct access to structures like ``PyObject``, etc. * New optimized CPython: incompatible, cannot import C extensions which don't use the limited C API, has new optimizations, limited to the C API. Technically, both runtimes would have the same code base, to ease maintenance: CPython. The new optimized CPython would be a ./configure flag to build a different Python. On Windows, it would be a different project of the Visual Studio solution reusing pythoncore project, but define a macro to build enable optimization and change the C API. The new optimized CPython runtime remains compatible with CPython 3.8 `stable ABI`_. CPython code base remains 30 years old. Many technical choices made 30 years ago are no longer relevant today. This PEP should ease the development of new Python implementation which would be even more efficient, like PyPy! Cython and cffi =============== Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython. Cython may be modified to add a new build mode where only the "limited C API" is used. Use Cases ========= Optimize CPython ---------------- The new optimized runtime can implement new optimizations since it only supports C extension modules which don't access Python internals. Tagged pointers ............... `Tagged pointer <https://en.wikipedia.org/wiki/Tagged_pointer>`_. Avoid ``PyObject`` for small objects (ex: small integers, short Latin-1 strings, None and True/False singletons): store the content directly in the pointer, with a tag for the object type. Tracing garbage collector ......................... Experiment with a tracing garbage collector inside CPython. Keep reference counting for the C API. Rewriting CPython with a tracing garbage collector is large project which is out of the scope of this PEP. This PEP fix some blockers issues which prevent to start such project today. One of the issue are functions of the C API which return a pointer like ``PyBytes_AsString()``. Python doesn't know when the caller stops using the pointer, and so cannot move the object in memory (for a moving garbage collector). API like ``PyBuffer`` is better since it requires the caller to call ``PyBuffer_Release()`` when it is done. Specialized list ................ Specialize lists of small integers: if a list only contains numbers which fit into a C ``int32_t``, a Python list object could use a more efficient ``int32_t`` array to reduce the memory footprint (avoid ``PyObject`` overhead for these numbers). Temporary ``PyObject`` objects would be created on demand for backward compatibility. This optimization is less interesting if tagged pointers are implemented. PyPy already implements this optimization. O(1) bytearray to bytes conversion .................................. Convert bytearray to bytes without memory copy. Currently, bytearray is used to build a bytes string, but it's usually converted into a bytes object to respect an API. This conversion requires to allocate a new memory block and copy data (O(n) complexity). It is possible to implement O(1) conversion if it would be possible to pass the ownership of the bytearray object to bytes. That requires modifying the ``PyBytesObject`` structure to support multiple storages (support storing content into a separate memory block). Fork and "Copy-on-Read" problem ............................... Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``. Currently, when a Python object is accessed, its ``ob_refcnt`` member is incremented temporarily to hold a "strong reference" to it (ensure that it cannot be destroyed while we use it). Many operating system implement fork() using copy-on-write ("CoW"). A memory page (ex: 4 KB) is only copied when a process (parent or child) modifies it. After Python is forked, modifying ``ob_refcnt`` copies the memory page, even if the object is only accessed in "read only mode". `Dismissing Python Garbage Collection at Instagram <https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172>`_ (Jan 2017) by Instagram Engineering. Instagram contributed `gc.freeze() <https://docs.python.org/dev/library/gc.html#gc.freeze>`_ to Python 3.7 which works around the issue. One solution for that would be to store reference counters outside ``PyObject``. For example, in a separated hash table (pointer to reference counter). Changing ``PyObject`` structures requires that C extensions don't access them directly. Debug runtime and remove debug checks in release mode ..................................................... If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions. If using such a debug runtime becomes harder, indirectly it means that runtime debug checks can be removed from the release build. CPython code base is still full of runtime checks calling ``PyErr_BadInternalCall()`` on failure. Removing such checks in release mode can make Python more efficient. PyPy ---- ujson is 3x faster on PyPy when using HPy instead of the Python C API. See `HPy kick-off sprint report <https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html>`_ (December 2019). This PEP should help to make PyPy cpyext more efficient, or at least ease the migration of C extensions to HPy. GraalPython ----------- `GraalPython <https://github.com/graalvm/graalpython>`_ is a Python 3 implementation built on `GraalVM <https://www.graalvm.org/>`_ ("Universal VM for a polyglot world"). It is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_. It would also benefit of this PEP. RustPython, Rust-CPython and PyO3 --------------------------------- Rust-CPython is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_. RustPython and PyO3 would also benefit of this PEP. Links: * `PyO3 <https://github.com/PyO3/pyo3>`_: Rust bindings for the Python (CPython) interpreter * `rust-cpython <https://github.com/dgrunwald/rust-cpython>`_: Rust <-> Python (CPython) bindings * `RustPython <https://github.com/RustPython/RustPython>`_: A Python Interpreter written in Rust Rejected Ideas ============== Drop the C API -------------- One proposed alternative to a new better C API is to drop the C API at all. The reasoning is that since existing solutions are already available, complete and reliable, like Cython and cffi. What about the long tail of C extensions on PyPI which still use the C API? Would a Python without these C extensions would remain relevant? Lots of project do not use those solution, and the C API is part of Python success. For example, there would be no numpy without the C API. It doesn't sound like a workable solution. Bet on HPy, leave the C API unchanged ------------------------------------- The HPy project is developed outside CPython and so doesn't cause any backward incompatibility in CPython. HPy API was designed with efficiency in mind. The problem is the long tail of C extensions on PyPI which are written with the C API and will not be converted soon or will never be converted to HPy. The transition from Python 2 to Python 3 showed that migrations are very slow and never fully complete. The PEP also rely on the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible with the new opaque C API. The concept of HPy is not new: CPython has a limited C API which provides a stable ABI since Python 3.4, see `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`_. Since it is an opt-in option, most users simply use the **default** C API. Prior Art ========= * `pythoncapi.readthedocs.io <https://pythoncapi.readthedocs.io/>`_: Research project behind this PEP * July 2019: Keynote `Python Performance: Past, Present, Future <https://github.com/vstinner/talks/raw/master/2019-EuroPython/python_performance.pdf>`_ (slides) by Victor Stinner at EuroPython 2019 * [python-dev] `Make the stable API-ABI usable <https://mail.python.org/pipermail/python-dev/2017-November/150607.html>`_ (November 2017) by Victor Stinner * [python-ideas] `PEP: Hide implementation details in the C API <https://mail.python.org/pipermail/python-ideas/2017-July/046399.html>`_ (July 2017) by Victor Stinner. Old PEP draft which proposed to add an option to build C extensions. * `A New C API for CPython <https://vstinner.github.io/new-python-c-api.html>`_ (Sept 2017) article by Victor Stinner * `Python Performance <https://github.com/vstinner/conf/raw/master/2017-PyconUS/summit.pdf>`_ (May 2017 at the Language Summit) by Victor Stinner: early discusssions on reorganizing header files, promoting PyPy, fix the C API, etc. Discussion summarized in `Keeping Python competitive <https://lwn.net/Articles/723949/>`_ article. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/ Code of Conduct: http://python.org/psf/codeofconduct/