[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3)

2022-02-28 Thread Eric Snow
I've updated PEP 683 for the feedback I've gotten.  Thanks again for that!

The updated PEP text is included below.  The largest changes involve
either the focus of the PEP (internal mechanism to mark objects
immortal) or the possible ways that things can break on older 32-bit
stable ABI extensions.  All other changes are smaller.

Given the last round of discussion, I'm hoping this will be the last
round before we go to the steering council.

-eric



PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow , Eddie Elizondo

Discussions-To:
https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History: 15-Feb-2022, 19-Feb-2022, 28-Feb-2022
Resolution:


Abstract


Currently the CPython runtime maintains a
`small amount of mutable state `_ in the
allocated memory of each object.  Because of this, otherwise immutable
objects are actually mutable.  This can have a large negative impact
on CPU and memory performance, especially for approaches to increasing
Python's scalability.

This proposal mandates that, internally, CPython will support marking
an object as one for which that runtime state will no longer change.
Consequently, such an object's refcount will never reach 0, and so the
object will never be cleaned up.  We call these objects "immortal".
(Normally, only a relatively small number of internal objects
will ever be immortal.)  The fundamental improvement here
is that now an object can be truly immutable.

Scope
-

Object immortality is meant to be an internal-only feature.  So this
proposal does not include any changes to public API or behavior
(with one exception).  As usual, we may still add some private
(yet publicly accessible) API to do things like immortalize an object
or tell if one is immortal.  Any effort to expose this feature to users
would need to be proposed separately.

There is one exception to "no change in behavior": refcounting semantics
for immortal objects will differ in some cases from user expectations.
This exception, and the solution, are discussed below.

Most of this PEP focuses on an internal implementation that satisfies
the above mandate.  However, those implementation details are not meant
to be strictly proscriptive.  Instead, at the least they are included
to help illustrate the technical considerations required by the mandate.
The actual implementation may deviate somewhat as long as it satisfies
the constraints outlined below.  Furthermore, the acceptability of any
specific implementation detail described below does not depend on
the status of this PEP, unless explicitly specified.

For example, the particular details of:

* how to mark something as immortal
* how to recognize something as immortal
* which subset of functionally immortal objects are marked as immortal
* which memory-management activities are skipped or modified for
immortal objects

are not only CPython-specific but are also private implementation
details that are expected to change in subsequent versions.

Implementation Summary
--

Here's a high-level look at the implementation:

If an object's refcount matches a very specific value (defined below)
then that object is treated as immortal.  The CPython C-API and runtime
will not modify the refcount (or other runtime state) of an immortal
object.

Aside from the change to refcounting semantics, there is one other
possible negative impact to consider.  A naive implementation of the
approach described below makes CPython roughly 4% slower.  However,
the implementation is performance-neutral once known mitigations
are applied.


Motivation
==

As noted above, currently all objects are effectively mutable.  That
includes "immutable" objects like ``str`` instances.  This is because
every object's refcount is frequently modified as the object is used
during execution.  This is especially significant for a number of
commonly used global (builtin) objects, e.g. ``None``.  Such objects
are used a lot, both in Python code and internally.  That adds up to
a consistent high volume of refcount changes.

The effective mutability of all Python objects has a concrete impact
on parts of the Python community, e.g. projects that aim for
scalability like Instragram or the effort to make the GIL
per-interpreter.  Below we describe several ways in which refcount
modification has a real negative effect on such projects.
None of that would happen for objects that are truly immutable.

Reducing CPU Cache Invalidation
---

Every modification of a refcount causes the corresponding CPU cache
line to be invalidated.  This has a number of effects.

For one, the write must be propagated to other cache levels
and to main memory.  This has small effect on all Python programs.
Immortal objects would provide a slight relief in 

[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

2022-02-18 Thread Eric Snow
Thanks to all those that provided feedback.  I've worked to
substantially update the PEP in response.  The text is included below.
Further feedback is appreciated.

-eric



PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow , Eddie Elizondo

Discussions-To:
https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History: 15-Feb-2022
Resolution:


Abstract


Currently the CPython runtime maintains a
`small amount of mutable state `_ in the
allocated memory of each object.  Because of this, otherwise immutable
objects are actually mutable.  This can have a large negative impact
on CPU and memory performance, especially for approaches to increasing
Python's scalability.  The solution proposed here provides a way
to mark an object as one for which that per-object
runtime state should not change.

Specifically, if an object's refcount matches a very specific value
(defined below) then that object is treated as "immortal".  If an object
is immortal then its refcount will never be modified by ``Py_INCREF()``,
etc.  Consequently, the refcount will never reach 0, so that object will
never be cleaned up (unless explicitly done, e.g. during runtime
finalization).  Additionally, all other per-object runtime state
for an immortal object will be considered immutable.

This approach has some possible negative impact, which is explained
below, along with mitigations.  A critical requirement for this change
is that the performance regression be no more than 2-3%.  Anything worse
the performance-neutral requires that the other benefits are proportionally
large.  Aside from specific applications, the fundamental improvement
here is that now an object can be truly immutable.

(This proposal is meant to be CPython-specific and to affect only
internal implementation details.  There are some slight exceptions
to that which are explained below.  See `Backward Compatibility`_,
`Public Refcount Details`_, and `scope`_.)


Motivation
==

As noted above, currently all objects are effectively mutable.  That
includes "immutable" objects like ``str`` instances.  This is because
every object's refcount is frequently modified as the object is used
during execution.  This is especially significant for a number of
commonly used global (builtin) objects, e.g. ``None``.  Such objects
are used a lot, both in Python code and internally.  That adds up to
a consistent high volume of refcount changes.

The effective mutability of all Python objects has a concrete impact
on parts of the Python community, e.g. projects that aim for
scalability like Instragram or the effort to make the GIL
per-interpreter.  Below we describe several ways in which refcount
modification has a real negative effect on such projects.
None of that would happen for objects that are truly immutable.

Reducing CPU Cache Invalidation
---

Every modification of a refcount causes the corresponding CPU cache
line to be invalidated.  This has a number of effects.

For one, the write must be propagated to other cache levels
and to main memory.  This has small effect on all Python programs.
Immortal objects would provide a slight relief in that regard.

On top of that, multi-core applications pay a price.  If two threads
(running simultaneously on distinct cores) are interacting with the
same object (e.g. ``None``)  then they will end up invalidating each
other's caches with each incref and decref.  This is true even for
otherwise immutable objects like ``True``, ``0``, and ``str`` instances.
CPython's GIL helps reduce this effect, since only one thread runs at a
time, but it doesn't completely eliminate the penalty.

Avoiding Data Races
---

Speaking of multi-core, we are considering making the GIL
a per-interpreter lock, which would enable true multi-core parallelism.
Among other things, the GIL currently protects against races between
multiple concurrent threads that may incref or decref the same object.
Without a shared GIL, two running interpreters could not safely share
any objects, even otherwise immutable ones like ``None``.

This means that, to have a per-interpreter GIL, each interpreter must
have its own copy of *every* object.  That includes the singletons and
static types.  We have a viable strategy for that but it will require
a meaningful amount of extra effort and extra complexity.

The alternative is to ensure that all shared objects are truly immutable.
There would be no races because there would be no modification.  This
is something that the immortality proposed here would enable for
otherwise immutable objects.  With immortal objects,
support for a per-interpreter GIL
becomes much simpler.

Avoiding Copy-on-Write
--

For some applications it makes sense to get the application into
a 

[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount"

2022-02-15 Thread Eric Snow
Eddie and I would appreciate your feedback on this proposal to support
treating some objects as "immortal".  The fundamental characteristic
of the approach is that we would provide stronger guarantees about
immutability for some objects.

A few things to note:

* this is essentially an internal-only change:  there are no
user-facing changes (aside from affecting any 3rd party code that
directly relies on specific refcounts)
* the naive implementation shows a 4% slowdown
* we have a number of strategies that should reduce that penalty
* without immortal objects, the implementation for per-interpreter GIL
will require a number of non-trivial workarounds

That last one is particularly meaningful to me since it means we would
definitely miss the 3.11 feature freeze.  With immortal objects, 3.11
would still be in reach.

-eric

---

PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow , Eddie Elizondo

Discussions-To: python-dev@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History:
Resolution:


Abstract


Under this proposal, any object may be marked as immortal.
"Immortal" means the object will never be cleaned up (at least until
runtime finalization).  Specifically, the `refcount`_ for an immortal
object is set to a sentinel value, and that refcount is never changed
by ``Py_INCREF()``, ``Py_DECREF()``, or ``Py_SET_REFCNT()``.
For immortal containers, the ``PyGC_Head`` is never
changed by the garbage collector.

Avoiding changes to the refcount is an essential part of this
proposal.  For what we call "immutable" objects, it makes them
truly immutable.  As described further below, this allows us
to avoid performance penalties in scenarios that
would otherwise be prohibitive.

This proposal is CPython-specific and, effectively, describes
internal implementation details.

.. _refcount: https://docs.python.org/3.11/c-api/intro.html#reference-counts


Motivation
==

Without immortal objects, all objects are effectively mutable.  That
includes "immutable" objects like ``None`` and ``str`` instances.
This is because every object's refcount is frequently modified
as it is used during execution.  In addition, for containers
the runtime may modify the object's ``PyGC_Head``.  These
runtime-internal state currently prevent
full immutability.

This has a concrete impact on active projects in the Python community.
Below we describe several ways in which refcount modification has
a real negative effect on those projects.  None of that would
happen for objects that are truly immutable.

Reducing Cache Invalidation
---

Every modification of a refcount causes the corresponding cache
line to be invalidated.  This has a number of effects.

For one, the write must be propagated to other cache levels
and to main memory.  This has small effect on all Python programs.
Immortal objects would provide a slight relief in that regard.

On top of that, multi-core applications pay a price.  If two threads
are interacting with the same object (e.g. ``None``)  then they will
end up invalidating each other's caches with each incref and decref.
This is true even for otherwise immutable objects like ``True``,
``0``, and ``str`` instances.  This is also true even with
the GIL, though the impact is smaller.

Avoiding Data Races
---

Speaking of multi-core, we are considering making the GIL
a per-interpreter lock, which would enable true multi-core parallelism.
Among other things, the GIL currently protects against races between
multiple threads that concurrently incref or decref.  Without a shared
GIL, two running interpreters could not safely share any objects,
even otherwise immutable ones like ``None``.

This means that, to have a per-interpreter GIL, each interpreter must
have its own copy of *every* object, including the singletons and
static types.  We have a viable strategy for that but it will
require a meaningful amount of extra effort and extra
complexity.

The alternative is to ensure that all shared objects are truly immutable.
There would be no races because there would be no modification.  This
is something that the immortality proposed here would enable for
otherwise immutable objects.  With immortal objects,
support for a per-interpreter GIL
becomes much simpler.

Avoiding Copy-on-Write
--

For some applications it makes sense to get the application into
a desired initial state and then fork the process for each worker.
This can result in a large performance improvement, especially
memory usage.  Several enterprise Python users (e.g. Instagram,
YouTube) have taken advantage of this.  However, the above
refcount semantics drastically reduce the benefits and
has led to some sub-optimal workarounds.

Also note that "fork" isn't the only operating system mechanism
that uses copy-on-write semantics.


Rationale
=

The proposed solution