[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3)
I've updated PEP 683 for the feedback I've gotten. Thanks again for that! The updated PEP text is included below. The largest changes involve either the focus of the PEP (internal mechanism to mark objects immortal) or the possible ways that things can break on older 32-bit stable ABI extensions. All other changes are smaller. Given the last round of discussion, I'm hoping this will be the last round before we go to the steering council. -eric PEP: 683 Title: Immortal Objects, Using a Fixed Refcount Author: Eric Snow , Eddie Elizondo Discussions-To: https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2022 Python-Version: 3.11 Post-History: 15-Feb-2022, 19-Feb-2022, 28-Feb-2022 Resolution: Abstract Currently the CPython runtime maintains a `small amount of mutable state `_ in the allocated memory of each object. Because of this, otherwise immutable objects are actually mutable. This can have a large negative impact on CPU and memory performance, especially for approaches to increasing Python's scalability. This proposal mandates that, internally, CPython will support marking an object as one for which that runtime state will no longer change. Consequently, such an object's refcount will never reach 0, and so the object will never be cleaned up. We call these objects "immortal". (Normally, only a relatively small number of internal objects will ever be immortal.) The fundamental improvement here is that now an object can be truly immutable. Scope - Object immortality is meant to be an internal-only feature. So this proposal does not include any changes to public API or behavior (with one exception). As usual, we may still add some private (yet publicly accessible) API to do things like immortalize an object or tell if one is immortal. Any effort to expose this feature to users would need to be proposed separately. There is one exception to "no change in behavior": refcounting semantics for immortal objects will differ in some cases from user expectations. This exception, and the solution, are discussed below. Most of this PEP focuses on an internal implementation that satisfies the above mandate. However, those implementation details are not meant to be strictly proscriptive. Instead, at the least they are included to help illustrate the technical considerations required by the mandate. The actual implementation may deviate somewhat as long as it satisfies the constraints outlined below. Furthermore, the acceptability of any specific implementation detail described below does not depend on the status of this PEP, unless explicitly specified. For example, the particular details of: * how to mark something as immortal * how to recognize something as immortal * which subset of functionally immortal objects are marked as immortal * which memory-management activities are skipped or modified for immortal objects are not only CPython-specific but are also private implementation details that are expected to change in subsequent versions. Implementation Summary -- Here's a high-level look at the implementation: If an object's refcount matches a very specific value (defined below) then that object is treated as immortal. The CPython C-API and runtime will not modify the refcount (or other runtime state) of an immortal object. Aside from the change to refcounting semantics, there is one other possible negative impact to consider. A naive implementation of the approach described below makes CPython roughly 4% slower. However, the implementation is performance-neutral once known mitigations are applied. Motivation == As noted above, currently all objects are effectively mutable. That includes "immutable" objects like ``str`` instances. This is because every object's refcount is frequently modified as the object is used during execution. This is especially significant for a number of commonly used global (builtin) objects, e.g. ``None``. Such objects are used a lot, both in Python code and internally. That adds up to a consistent high volume of refcount changes. The effective mutability of all Python objects has a concrete impact on parts of the Python community, e.g. projects that aim for scalability like Instragram or the effort to make the GIL per-interpreter. Below we describe several ways in which refcount modification has a real negative effect on such projects. None of that would happen for objects that are truly immutable. Reducing CPU Cache Invalidation --- Every modification of a refcount causes the corresponding CPU cache line to be invalidated. This has a number of effects. For one, the write must be propagated to other cache levels and to main memory. This has small effect on all Python programs. Immortal objects would provide a slight relief in
[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)
Thanks to all those that provided feedback. I've worked to substantially update the PEP in response. The text is included below. Further feedback is appreciated. -eric PEP: 683 Title: Immortal Objects, Using a Fixed Refcount Author: Eric Snow , Eddie Elizondo Discussions-To: https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2022 Python-Version: 3.11 Post-History: 15-Feb-2022 Resolution: Abstract Currently the CPython runtime maintains a `small amount of mutable state `_ in the allocated memory of each object. Because of this, otherwise immutable objects are actually mutable. This can have a large negative impact on CPU and memory performance, especially for approaches to increasing Python's scalability. The solution proposed here provides a way to mark an object as one for which that per-object runtime state should not change. Specifically, if an object's refcount matches a very specific value (defined below) then that object is treated as "immortal". If an object is immortal then its refcount will never be modified by ``Py_INCREF()``, etc. Consequently, the refcount will never reach 0, so that object will never be cleaned up (unless explicitly done, e.g. during runtime finalization). Additionally, all other per-object runtime state for an immortal object will be considered immutable. This approach has some possible negative impact, which is explained below, along with mitigations. A critical requirement for this change is that the performance regression be no more than 2-3%. Anything worse the performance-neutral requires that the other benefits are proportionally large. Aside from specific applications, the fundamental improvement here is that now an object can be truly immutable. (This proposal is meant to be CPython-specific and to affect only internal implementation details. There are some slight exceptions to that which are explained below. See `Backward Compatibility`_, `Public Refcount Details`_, and `scope`_.) Motivation == As noted above, currently all objects are effectively mutable. That includes "immutable" objects like ``str`` instances. This is because every object's refcount is frequently modified as the object is used during execution. This is especially significant for a number of commonly used global (builtin) objects, e.g. ``None``. Such objects are used a lot, both in Python code and internally. That adds up to a consistent high volume of refcount changes. The effective mutability of all Python objects has a concrete impact on parts of the Python community, e.g. projects that aim for scalability like Instragram or the effort to make the GIL per-interpreter. Below we describe several ways in which refcount modification has a real negative effect on such projects. None of that would happen for objects that are truly immutable. Reducing CPU Cache Invalidation --- Every modification of a refcount causes the corresponding CPU cache line to be invalidated. This has a number of effects. For one, the write must be propagated to other cache levels and to main memory. This has small effect on all Python programs. Immortal objects would provide a slight relief in that regard. On top of that, multi-core applications pay a price. If two threads (running simultaneously on distinct cores) are interacting with the same object (e.g. ``None``) then they will end up invalidating each other's caches with each incref and decref. This is true even for otherwise immutable objects like ``True``, ``0``, and ``str`` instances. CPython's GIL helps reduce this effect, since only one thread runs at a time, but it doesn't completely eliminate the penalty. Avoiding Data Races --- Speaking of multi-core, we are considering making the GIL a per-interpreter lock, which would enable true multi-core parallelism. Among other things, the GIL currently protects against races between multiple concurrent threads that may incref or decref the same object. Without a shared GIL, two running interpreters could not safely share any objects, even otherwise immutable ones like ``None``. This means that, to have a per-interpreter GIL, each interpreter must have its own copy of *every* object. That includes the singletons and static types. We have a viable strategy for that but it will require a meaningful amount of extra effort and extra complexity. The alternative is to ensure that all shared objects are truly immutable. There would be no races because there would be no modification. This is something that the immortality proposed here would enable for otherwise immutable objects. With immortal objects, support for a per-interpreter GIL becomes much simpler. Avoiding Copy-on-Write -- For some applications it makes sense to get the application into a
[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount"
Eddie and I would appreciate your feedback on this proposal to support treating some objects as "immortal". The fundamental characteristic of the approach is that we would provide stronger guarantees about immutability for some objects. A few things to note: * this is essentially an internal-only change: there are no user-facing changes (aside from affecting any 3rd party code that directly relies on specific refcounts) * the naive implementation shows a 4% slowdown * we have a number of strategies that should reduce that penalty * without immortal objects, the implementation for per-interpreter GIL will require a number of non-trivial workarounds That last one is particularly meaningful to me since it means we would definitely miss the 3.11 feature freeze. With immortal objects, 3.11 would still be in reach. -eric --- PEP: 683 Title: Immortal Objects, Using a Fixed Refcount Author: Eric Snow , Eddie Elizondo Discussions-To: python-dev@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2022 Python-Version: 3.11 Post-History: Resolution: Abstract Under this proposal, any object may be marked as immortal. "Immortal" means the object will never be cleaned up (at least until runtime finalization). Specifically, the `refcount`_ for an immortal object is set to a sentinel value, and that refcount is never changed by ``Py_INCREF()``, ``Py_DECREF()``, or ``Py_SET_REFCNT()``. For immortal containers, the ``PyGC_Head`` is never changed by the garbage collector. Avoiding changes to the refcount is an essential part of this proposal. For what we call "immutable" objects, it makes them truly immutable. As described further below, this allows us to avoid performance penalties in scenarios that would otherwise be prohibitive. This proposal is CPython-specific and, effectively, describes internal implementation details. .. _refcount: https://docs.python.org/3.11/c-api/intro.html#reference-counts Motivation == Without immortal objects, all objects are effectively mutable. That includes "immutable" objects like ``None`` and ``str`` instances. This is because every object's refcount is frequently modified as it is used during execution. In addition, for containers the runtime may modify the object's ``PyGC_Head``. These runtime-internal state currently prevent full immutability. This has a concrete impact on active projects in the Python community. Below we describe several ways in which refcount modification has a real negative effect on those projects. None of that would happen for objects that are truly immutable. Reducing Cache Invalidation --- Every modification of a refcount causes the corresponding cache line to be invalidated. This has a number of effects. For one, the write must be propagated to other cache levels and to main memory. This has small effect on all Python programs. Immortal objects would provide a slight relief in that regard. On top of that, multi-core applications pay a price. If two threads are interacting with the same object (e.g. ``None``) then they will end up invalidating each other's caches with each incref and decref. This is true even for otherwise immutable objects like ``True``, ``0``, and ``str`` instances. This is also true even with the GIL, though the impact is smaller. Avoiding Data Races --- Speaking of multi-core, we are considering making the GIL a per-interpreter lock, which would enable true multi-core parallelism. Among other things, the GIL currently protects against races between multiple threads that concurrently incref or decref. Without a shared GIL, two running interpreters could not safely share any objects, even otherwise immutable ones like ``None``. This means that, to have a per-interpreter GIL, each interpreter must have its own copy of *every* object, including the singletons and static types. We have a viable strategy for that but it will require a meaningful amount of extra effort and extra complexity. The alternative is to ensure that all shared objects are truly immutable. There would be no races because there would be no modification. This is something that the immortality proposed here would enable for otherwise immutable objects. With immortal objects, support for a per-interpreter GIL becomes much simpler. Avoiding Copy-on-Write -- For some applications it makes sense to get the application into a desired initial state and then fork the process for each worker. This can result in a large performance improvement, especially memory usage. Several enterprise Python users (e.g. Instagram, YouTube) have taken advantage of this. However, the above refcount semantics drastically reduce the benefits and has led to some sub-optimal workarounds. Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Rationale = The proposed solution