https://github.com/python/cpython/commit/2b7c28a4406da1b26dd0ebd38aa7371bed873ce4
commit: 2b7c28a4406da1b26dd0ebd38aa7371bed873ce4
branch: main
author: Peter Bierma <[email protected]>
committer: ZeroIntensity <[email protected]>
date: 2026-05-06T17:39:30-04:00
summary:

gh-149101: Implement PEP 788 (GH-149116)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Sam Gross <[email protected]>

files:
A Misc/NEWS.d/next/C_API/2026-04-28-17-43-12.gh-issue-149101.HTuHTb.rst
M Doc/c-api/interp-lifecycle.rst
M Doc/c-api/threads.rst
M Doc/data/stable_abi.dat
M Doc/howto/free-threading-extensions.rst
M Doc/whatsnew/3.15.rst
M Include/cpython/pystate.h
M Include/internal/pycore_interp_structs.h
M Include/internal/pycore_pystate.h
M Include/pystate.h
M Lib/test/libregrtest/tsan.py
M Lib/test/test_embed.py
M Lib/test/test_stable_abi_ctypes.py
M Misc/stable_abi.toml
M Modules/_testcapimodule.c
M Modules/_testinternalcapi.c
M PC/python3dll.c
M Programs/_testembed.c
M Python/pylifecycle.c
M Python/pystate.c
M Tools/c-analyzer/cpython/ignored.tsv

diff --git a/Doc/c-api/interp-lifecycle.rst b/Doc/c-api/interp-lifecycle.rst
index 186ab4370bcb9c..38dc806c4b81cd 100644
--- a/Doc/c-api/interp-lifecycle.rst
+++ b/Doc/c-api/interp-lifecycle.rst
@@ -578,31 +578,203 @@ Initializing and finalizing the interpreter
 
 .. _cautions-regarding-runtime-finalization:
 
-Cautions regarding runtime finalization
----------------------------------------
+Cautions regarding interpreter finalization
+-------------------------------------------
 
 In the late stage of :term:`interpreter shutdown`, after attempting to wait for
 non-daemon threads to exit (though this can be interrupted by
 :class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the 
runtime
-is marked as *finalizing*: :c:func:`Py_IsFinalizing` and
-:func:`sys.is_finalizing` return true.  At this point, only the *finalization
-thread* that initiated finalization (typically the main thread) is allowed to
-acquire the :term:`GIL`.
-
-If any thread, other than the finalization thread, attempts to attach a 
:term:`thread state`
-during finalization, either explicitly or
-implicitly, the thread enters **a permanently blocked state**
-where it remains until the program exits.  In most cases this is harmless, but 
this can result
-in deadlock if a later stage of finalization attempts to acquire a lock owned 
by the
-blocked thread, or otherwise waits on the blocked thread.
-
-Gross? Yes. This prevents random crashes and/or unexpectedly skipped C++
-finalizations further up the call stack when such threads were forcibly exited
-here in CPython 3.13 and earlier. The CPython runtime :term:`thread state` C 
APIs
-have never had any error reporting or handling expectations at :term:`thread 
state`
-attachment time that would've allowed for graceful exit from this situation. 
Changing that
-would require new stable C APIs and rewriting the majority of C code in the
-CPython ecosystem to use those with error handling.
+is marked as finalizing, meaning that :c:func:`Py_IsFinalizing` and
+:func:`sys.is_finalizing` return true.  At this point, only the finalization
+thread (the thread that initiated finalization; this is typically the main 
thread)
+is allowed to :term:`attach <attached thread state>` a thread state.
+
+Other threads that attempt to attach during finalization, either explicitly
+(such as via :c:func:`PyThreadState_Ensure` or :c:macro:`Py_END_ALLOW_THREADS`)
+or implicitly (such as in-between bytecode instructions), will enter a
+**permanently blocked state**. Generally, this is harmless, but this can
+result in deadlocks. For example, a thread may be permanently blocked while
+holding a lock, meaning that the finalization thread can never acquire that
+lock.
+
+Prior to CPython 3.13, the thread would exit instead of hanging,
+which led to other issues (see the warning note at
+:c:func:`PyThread_exit_thread`).
+
+Gross? Yes. Starting in Python 3.15, there are a number of C APIs that make
+it possible to avoid these issues by temporarily preventing finalization:
+
+.. _interpreter-guards:
+
+.. seealso::
+
+   :pep:`788` explains the design, motivation and rationale
+   for these APIs.
+
+.. c:type:: PyInterpreterGuard
+
+   An opaque interpreter guard structure.
+
+   By holding an interpreter guard, the caller can ensure that the interpreter
+   will not finalize until the guard is closed (through
+   :c:func:`PyInterpreterGuard_Close`).
+
+   When a guard is held, a thread attempting to finalize the interpreter will
+   block until the guard is closed before starting finalization.
+   After finalization has started, threads are forever unable to acquire
+   guards for that interpreter. This means that if you forget to close an
+   interpreter guard, the process will **permanently hang** during
+   finalization!
+
+   Holding a guard for an interpreter is similar to holding a
+   :term:`strong reference` to a Python object, except finalization does not 
happen
+   automatically after all guards are released: it requires an explicit
+   :c:func:`Py_EndInterpreter` call.
+
+   .. versionadded:: next
+
+
+.. c:function:: PyInterpreterGuard *PyInterpreterGuard_FromCurrent(void)
+
+   Create a finalization guard for the current interpreter. This will prevent
+   finalization until the guard is closed.
+
+   For example:
+
+   .. code-block:: c
+
+      // Temporarily prevent finalization.
+      PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+      if (guard == NULL) {
+         // Finalization has already started or we're out of memory.
+         return NULL;
+      }
+
+      Py_BEGIN_ALLOW_THREADS;
+      // Do some critical processing here. For example, we can safely acquire
+      // locks that might be acquired by the finalization thread.
+      Py_END_ALLOW_THREADS;
+
+      // Now that we're done with our critical processing, the interpreter is
+      // allowed to finalize again.
+      PyInterpreterGuard_Close(guard);
+
+   On success, this function returns a guard for the current interpreter;
+   on failure, it returns ``NULL`` with an exception set.
+
+   This function will fail only if the current interpreter has already started
+   finalizing, or if the process is out of memory.
+
+   The guard pointer returned by this function must be eventually closed
+   with :c:func:`PyInterpreterGuard_Close`; failing to do so will result in
+   the Python process infinitely hanging.
+
+   The caller must hold an :term:`attached thread state`.
+
+   .. versionadded:: next
+
+
+.. c:function:: PyInterpreterGuard 
*PyInterpreterGuard_FromView(PyInterpreterView *view)
+
+   Create a finalization guard for an interpreter through a view.
+
+   On success, this function returns a guard to the interpreter
+   represented by *view*. The view is still valid after calling this
+   function. The guard must eventually be closed with
+   :c:func:`PyInterpreterGuard_Close`.
+
+   If the interpreter no longer exists, is already finalizing, or out of 
memory,
+   then this function returns ``NULL`` without setting an exception.
+
+   The caller does not need to hold an :term:`attached thread state`.
+
+   .. versionadded:: next
+
+
+.. c:function:: void PyInterpreterGuard_Close(PyInterpreterGuard *guard)
+
+   Close an interpreter guard, allowing the interpreter to start
+   finalization if no other guards remain. If an interpreter guard
+   is never closed, the interpreter will infinitely wait when trying
+   to enter finalization!
+
+   After an interpreter guard is closed, it may not be used in
+   :c:func:`PyThreadState_Ensure`. Doing so will result in undefined
+   behavior.
+
+   This function cannot fail, and the caller doesn't need to hold an
+   :term:`attached thread state`.
+
+   .. versionadded:: next
+
+
+.. _interpreter-views:
+
+Interpreter views
+-----------------
+
+In some cases, it may be necessary to access an interpreter that may have been
+deleted. This can be done using interpreter views.
+
+.. c:type:: PyInterpreterView
+
+   An opaque view of an interpreter.
+
+   This is a thread-safe way to access an interpreter that may have be
+   finalizing or already destroyed.
+
+   .. versionadded:: next
+
+
+.. c:function:: PyInterpreterView *PyInterpreterView_FromCurrent(void)
+
+   Create a view to the current interpreter.
+
+   This function is generally meant to be used alongside
+   :c:func:`PyInterpreterGuard_FromView` or 
:c:func:`PyThreadState_EnsureFromView`.
+
+   On success, this function returns a view to the current interpreter; on
+   failure, it returns ``NULL`` with an exception set.
+
+   The caller must hold an :term:`attached thread state`.
+
+   .. versionadded:: next
+
+
+.. c:function:: void PyInterpreterView_Close(PyInterpreterView *view)
+
+   Close an interpreter view.
+
+   If an interpreter view is never closed, the view's memory will never be
+   freed, but there are no other consequences. (In contrast, forgetting to
+   close a guard will infinitely hang the main thread during finalization.)
+
+   This function cannot fail, and the caller doesn't need to hold an
+   :term:`attached thread state`.
+
+   .. versionadded:: next
+
+
+.. c:function:: PyInterpreterView *PyInterpreterView_FromMain(void)
+
+   Create a view for the main interpreter (the first and default
+   interpreter in a Python process; see
+   :c:func:`PyInterpreterState_Main`).
+
+   On success, this function returns a view to the main
+   interpreter; on failure, it returns ``NULL`` without an exception set.
+   Failure indicates that the process is out of memory.
+
+   Use this function when an interpreter pointer or view cannot be supplied
+   by the caller, such as when a native threading library does not provide a
+   ``void *arg`` parameter that could carry a :c:type:`PyInterpreterGuard` or
+   :c:type:`PyInterpreterView`. In code that supports subinterpreters, prefer
+   :c:func:`PyInterpreterView_FromCurrent` so the guard tracks the calling
+   interpreter rather than the main one.
+
+   The caller does not need to hold an :term:`attached thread state`.
+
+   .. versionadded:: next
 
 
 Process-wide parameters
diff --git a/Doc/c-api/threads.rst b/Doc/c-api/threads.rst
index 3b761d0c657cbd..f16125f383e09c 100644
--- a/Doc/c-api/threads.rst
+++ b/Doc/c-api/threads.rst
@@ -61,9 +61,9 @@ as in a :c:macro:`Py_BEGIN_ALLOW_THREADS` block or in a fresh 
thread, will the
 thread not have an attached thread state.
 If uncertain, check if :c:func:`PyThreadState_GetUnchecked` returns ``NULL``.
 
-If it turns out that you do need to create a thread state, call 
:c:func:`PyThreadState_New`
-followed by :c:func:`PyThreadState_Swap`, or use the dangerous
-:c:func:`PyGILState_Ensure` function.
+If it turns out that you do need to create a thread state, it is recommended to
+use :c:func:`PyThreadState_Ensure` or :c:func:`PyThreadState_EnsureFromView`,
+which will manage the thread state for you.
 
 
 .. _detaching-thread-state:
@@ -178,8 +178,12 @@ example usage in the Python source distribution.
    declaration.
 
 
-Non-Python created threads
---------------------------
+.. _non-python-created-threads:
+.. _c-api-foreign-threads:
+
+
+Using the C API from foreign threads
+------------------------------------
 
 When threads are created using the dedicated Python APIs (such as the
 :mod:`threading` module), a thread state is automatically associated with them,
@@ -192,70 +196,275 @@ of a callback API provided by the aforementioned 
third-party library),
 you must first register these threads with the interpreter by
 creating a new thread state and attaching it.
 
-The most robust way to do this is through :c:func:`PyThreadState_New` followed
-by :c:func:`PyThreadState_Swap`.
+The easiest way to do this is through :c:func:`PyThreadState_Ensure`
+or :c:func:`PyThreadState_EnsureFromView`.
 
 .. note::
-   ``PyThreadState_New`` requires an argument pointing to the desired
+   These functions require an argument pointing to the desired
    interpreter; such a pointer can be acquired via a call to
-   :c:func:`PyInterpreterState_Get` from the code where the thread was
-   created.
+   :c:func:`PyInterpreterGuard_FromCurrent` (for ``PyThreadState_Ensure``) or
+   :c:func:`PyInterpreterView_FromCurrent` (for 
``PyThreadState_EnsureFromView``)
+   from the function that creates the thread. If no pointer is available (such
+   as when the given native thread library doesn't provide a data argument),
+   :c:func:`PyInterpreterView_FromMain` can be used to get a view for the main
+   interpreter, but note that this will make the code incompatible with
+   subinterpreters.
 
-For example::
 
-   /* The return value of PyInterpreterState_Get() from the
-      function that created this thread. */
-   PyInterpreterState *interp = thread_data->interp;
+For example::
 
-   /* Create a new thread state for the interpreter. It does not start out
-      attached. */
-   PyThreadState *tstate = PyThreadState_New(interp);
+   // The return value of PyInterpreterGuard_FromCurrent() from the
+   // function that created this thread.
+   PyInterpreterGuard *guard = thread_data->guard;
 
-   /* Attach the thread state, which will acquire the GIL. */
-   PyThreadState_Swap(tstate);
+   // Create a new thread state for the interpreter.
+   PyThreadStateToken *token = PyThreadState_Ensure(guard);
+   if (token == NULL) {
+      PyInterpreterGuard_Close(guard);
+      return;
+   }
 
-   /* Perform Python actions here. */
+   // We have a valid thread state -- perform Python actions here.
    result = CallSomeFunction();
-   /* evaluate result or handle exception */
+   // Evaluate result or handle exceptions.
 
-   /* Destroy the thread state. No Python API allowed beyond this point. */
-   PyThreadState_Clear(tstate);
-   PyThreadState_DeleteCurrent();
+   // Release the thread state. No calls to the C API are allowed beyond this
+   // point.
+   PyThreadState_Release(token);
+   PyInterpreterGuard_Close(guard);
 
-.. warning::
 
-   If the interpreter finalized before ``PyThreadState_Swap`` was called, then
-   ``interp`` will be a dangling pointer!
+Keep in mind that calling ``PyThreadState_Ensure`` might not always create a 
new
+thread state, and calling ``PyThreadState_Release`` might not always detach it.
+These functions may reuse an existing attached thread state, or may re-attach
+a thread state that was previously attached for the current thread.
+
+.. seealso::
+   :pep:`788`
+
+.. _c-api-attach-detach:
+
+Attaching/detaching thread states
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. c:function:: PyThreadStateToken *PyThreadState_Ensure(PyInterpreterGuard 
*guard)
+
+   Ensure that the thread has an attached thread state for the
+   interpreter protected by *guard*, and thus can safely invoke that
+   interpreter.
+
+   It is OK to call this function if the thread already has an
+   attached thread state, as long as there is a subsequent call to
+   :c:func:`PyThreadState_Release` that matches this one (meaning that "nested"
+   calls to this function are permitted).
+
+   The function's effect (if any) will be reversed by the matching call to
+   :c:func:`PyThreadState_Release`.
+
+   On error, this function returns ``NULL`` *without* an exception set.
+   Do not call :c:func:`!PyThreadState_Release` in this case.
+
+   On success, this function returns a pointer value that must be passed
+   to the matching call to :c:func:`!PyThreadState_Release`.
+
+   The conditions in which this function creates a new :term:`thread state` are
+   considered unstable and implementation-dependent. If you need to control the
+   exact lifetime of a thread state, consider using 
:c:func:`PyThreadState_New`.
+   However, do not avoid this function solely on the basis that the lifetime
+   of the thread state may be inconsistent across versions; changes to this
+   function will be done with caution and in a backwards-compatible manner.
+   In particular, the saving of thread-local variables and similar state will
+   be retained across Python versions.
+
+   .. impl-detail::
+
+      The exact behavior of whether this function creates a new thread state is
+      described below, but be aware that this may change in the future.
+
+      First, this function checks if an attached thread state is present.
+      If there is, this function then checks if the interpreter of that
+      thread state matches the interpreter guarded by *guard*. If that is
+      the case, this function simply marks the thread state as being used
+      by a ``PyThreadState_Ensure`` call and returns.
+
+      If there is no attached thread state, then this function checks if any
+      thread state has been used by the current OS thread. (This is
+      returned by :c:func:`PyGILState_GetThisThreadState`.)
+      If there was, then this function checks if that thread state's 
interpreter
+      matches *guard*. If it does, it is re-attached and marked as used.
+
+      Otherwise, if both of the above cases fail, a new thread state is created
+      for *guard*. It is then attached and marked as owned by 
``PyThreadState_Ensure``.
+
+   .. versionadded:: next
+
 
+.. c:function:: PyThreadStateToken 
*PyThreadState_EnsureFromView(PyInterpreterView *view)
+
+   Get an attached thread state for the interpreter referenced by *view*.
+
+   The behavior and return value are the same as for 
:c:func:`PyThreadState_Ensure`;
+   additionally, if the function succeeds, the interpreter referenced by 
*view* will
+   be implicitly guarded. The guard will be released upon the corresponding
+   :c:func:`PyThreadState_Release` call.
+
+   .. versionadded:: next
+
+
+.. c:function:: void PyThreadState_Release(PyThreadStateToken *token)
+
+   Undo a :c:func:`PyThreadState_Ensure` or
+   :c:func:`PyThreadState_EnsureFromView` call.
+
+   This must be called exactly once for each successful *Ensure* call, with
+   *token* set to that call's return value.
+
+   The state that was attached before the corresponding *Ensure* call
+   (if any) will be attached when :c:func:`PyThreadState_Release` returns.
+
+   The exact behavior of whether this function deletes a thread state is
+   considered unstable and implementation-dependent.
+
+   .. impl-detail::
+
+      Currently, this function will decrement an internal counter on the
+      attached thread state. If this counter ever reaches below zero, this
+      function emits a fatal error (via :c:func:`Py_FatalError`).
+
+      If the attached thread state is owned by ``PyThreadState_Ensure``, then 
the
+      attached thread state will be deallocated and deleted upon the internal 
counter
+      reaching zero. Otherwise, nothing happens when the counter reaches zero.
+
+   .. versionadded:: next
+
+.. c:type:: PyThreadStateToken
+
+   An opaque token retrieved from a :c:func:`PyThreadState_Ensure` call
+   and passed to a corresponding :c:func:`PyThreadState_Release` call.
+
+
+.. _legacy-api:
 .. _gilstate:
 
-Legacy API
-----------
+GIL-state APIs
+--------------
+
+The following APIs are generally not compatible with subinterpreters and
+will hang the process during interpreter finalization (see
+:ref:`cautions-regarding-runtime-finalization`). As such, these APIs were
+:term:`soft deprecated` in Python 3.15 in favor of the :ref:`new APIs
+<c-api-foreign-threads>`.
+
+
+.. c:type:: PyGILState_STATE
 
-Another common pattern to call Python code from a non-Python thread is to use
-:c:func:`PyGILState_Ensure` followed by a call to :c:func:`PyGILState_Release`.
+   The type of the value returned by :c:func:`PyGILState_Ensure` and passed to
+   :c:func:`PyGILState_Release`.
 
-These functions do not work well when multiple interpreters exist in the Python
-process. If no Python interpreter has ever been used in the current thread 
(which
-is common for threads created outside Python), ``PyGILState_Ensure`` will 
create
-and attach a thread state for the "main" interpreter (the first interpreter in
-the Python process).
+   .. c:enumerator:: PyGILState_LOCKED
 
-Additionally, these functions have thread-safety issues during interpreter
-finalization. Using ``PyGILState_Ensure`` during finalization will likely
-crash the process.
+      The GIL was already held when :c:func:`PyGILState_Ensure` was called.
 
-Usage of these functions look like such::
+   .. c:enumerator:: PyGILState_UNLOCKED
 
-   PyGILState_STATE gstate;
-   gstate = PyGILState_Ensure();
+      The GIL was not held when :c:func:`PyGILState_Ensure` was called.
 
-   /* Perform Python actions here. */
-   result = CallSomeFunction();
-   /* evaluate result or handle exception */
 
-   /* Release the thread. No Python API allowed beyond this point. */
-   PyGILState_Release(gstate);
+.. c:function:: PyGILState_STATE PyGILState_Ensure()
+
+   Ensure that the current thread is ready to call the Python C API regardless
+   of the current state of Python, or of the :term:`attached thread state`. 
This may
+   be called as many times as desired by a thread as long as each call is
+   matched with a call to :c:func:`PyGILState_Release`. In general, other
+   thread-related APIs may be used between :c:func:`PyGILState_Ensure` and
+   :c:func:`PyGILState_Release` calls as long as the thread state is restored 
to
+   its previous state before the Release().  For example, normal usage of the
+   :c:macro:`Py_BEGIN_ALLOW_THREADS` and :c:macro:`Py_END_ALLOW_THREADS` 
macros is
+   acceptable.
+
+   The return value is an opaque "handle" to the :term:`attached thread state` 
when
+   :c:func:`PyGILState_Ensure` was called, and must be passed to
+   :c:func:`PyGILState_Release` to ensure Python is left in the same state. 
Even
+   though recursive calls are allowed, these handles *cannot* be shared - each
+   unique call to :c:func:`PyGILState_Ensure` must save the handle for its call
+   to :c:func:`PyGILState_Release`.
+
+   When the function returns, there will be an :term:`attached thread state`
+   and the thread will be able to call arbitrary Python code.
+
+   This function has no way to return an error. As such, errors are either 
fatal
+   (that is, they send ``SIGABRT`` and crash the process; see
+   :c:func:`Py_FatalError`), or the thread will be permanently blocked (such as
+   during interpreter finalization).
+
+   .. warning::
+      Calling this function when the interpreter is finalizing will
+      infinitely hang the thread, which may cause deadlocks.
+      :ref:`cautions-regarding-runtime-finalization` for more details.
+
+      In addition, this function generally does not work with subinterpreters
+      when used from foreign threads, because this function has no way of
+      knowing which interpreter created the thread (and as such, will 
implicitly
+      pick the main interpreter).
+
+   .. versionchanged:: 3.14
+      Hangs the current thread, rather than terminating it, if called while the
+      interpreter is finalizing.
+
+   .. soft-deprecated:: 3.15
+      Use :c:func:`PyThreadState_Ensure` or
+      :c:func:`PyThreadState_EnsureFromView` instead.
+
+
+.. c:function:: void PyGILState_Release(PyGILState_STATE)
+
+   Release any resources previously acquired.  After this call, Python's state 
will
+   be the same as it was prior to the corresponding 
:c:func:`PyGILState_Ensure` call
+   (but generally this state will be unknown to the caller, hence the use of 
the
+   GIL-state API).
+
+   Every call to :c:func:`PyGILState_Ensure` must be matched by a call to
+   :c:func:`PyGILState_Release` on the same thread.
+
+   .. soft-deprecated:: 3.15
+      Use :c:func:`PyThreadState_Release` instead.
+
+
+.. c:function:: PyThreadState* PyGILState_GetThisThreadState()
+
+   Get the :term:`thread state` that was most recently :term:`attached
+   <attached thread state>` for this thread. (If the most recent thread state
+   has been deleted, this returns ``NULL``.)
+
+   If the caller has an attached thread state, it is returned.
+
+   In other terms, this function returns the thread state that will be used by
+   :c:func:`PyGILState_Ensure`. If this returns ``NULL``, then
+   ``PyGILState_Ensure`` will create a new thread state.
+
+   This function cannot fail.
+
+   .. soft-deprecated:: 3.15
+      Use :c:func:`PyThreadState_Get` or :c:func:`PyThreadState_GetUnchecked`
+      instead.
+
+
+.. c:function:: int PyGILState_Check()
+
+   Return ``1`` if the current thread has an :term:`attached thread state`
+   that matches the thread state returned by
+   :c:func:`PyGILState_GetThisThreadState`. If the caller has no attached 
thread
+   state or it otherwise doesn't match, then this returns ``0``.
+
+   If the current Python process has ever created a subinterpreter, this
+   function will *always* return ``1``.
+
+   This is mainly a helper/diagnostic function.
+
+   .. versionadded:: 3.4
+
+   .. soft-deprecated:: 3.15
+      Use ``PyThreadState_GetUnchecked() != NULL`` instead.
 
 
 .. _fork-and-threads:
@@ -398,101 +607,6 @@ C extensions.
       thread if the runtime is finalizing.
 
 
-GIL-state APIs
---------------
-
-The following functions use thread-local storage, and are not compatible
-with sub-interpreters:
-
-.. c:type:: PyGILState_STATE
-
-   The type of the value returned by :c:func:`PyGILState_Ensure` and passed to
-   :c:func:`PyGILState_Release`.
-
-   .. c:enumerator:: PyGILState_LOCKED
-
-      The GIL was already held when :c:func:`PyGILState_Ensure` was called.
-
-   .. c:enumerator:: PyGILState_UNLOCKED
-
-      The GIL was not held when :c:func:`PyGILState_Ensure` was called.
-
-.. c:function:: PyGILState_STATE PyGILState_Ensure()
-
-   Ensure that the current thread is ready to call the Python C API regardless
-   of the current state of Python, or of the :term:`attached thread state`. 
This may
-   be called as many times as desired by a thread as long as each call is
-   matched with a call to :c:func:`PyGILState_Release`. In general, other
-   thread-related APIs may be used between :c:func:`PyGILState_Ensure` and
-   :c:func:`PyGILState_Release` calls as long as the thread state is restored 
to
-   its previous state before the Release().  For example, normal usage of the
-   :c:macro:`Py_BEGIN_ALLOW_THREADS` and :c:macro:`Py_END_ALLOW_THREADS` 
macros is
-   acceptable.
-
-   The return value is an opaque "handle" to the :term:`attached thread state` 
when
-   :c:func:`PyGILState_Ensure` was called, and must be passed to
-   :c:func:`PyGILState_Release` to ensure Python is left in the same state. 
Even
-   though recursive calls are allowed, these handles *cannot* be shared - each
-   unique call to :c:func:`PyGILState_Ensure` must save the handle for its call
-   to :c:func:`PyGILState_Release`.
-
-   When the function returns, there will be an :term:`attached thread state`
-   and the thread will be able to call arbitrary Python code.  Failure is a 
fatal error.
-
-   .. warning::
-      Calling this function when the runtime is finalizing is unsafe. Doing
-      so will either hang the thread until the program ends, or fully crash
-      the interpreter in rare cases. Refer to
-      :ref:`cautions-regarding-runtime-finalization` for more details.
-
-   .. versionchanged:: 3.14
-      Hangs the current thread, rather than terminating it, if called while the
-      interpreter is finalizing.
-
-.. c:function:: void PyGILState_Release(PyGILState_STATE)
-
-   Release any resources previously acquired.  After this call, Python's state 
will
-   be the same as it was prior to the corresponding 
:c:func:`PyGILState_Ensure` call
-   (but generally this state will be unknown to the caller, hence the use of 
the
-   GILState API).
-
-   Every call to :c:func:`PyGILState_Ensure` must be matched by a call to
-   :c:func:`PyGILState_Release` on the same thread.
-
-.. c:function:: PyThreadState* PyGILState_GetThisThreadState()
-
-   Get the :term:`attached thread state` for this thread.  May return ``NULL`` 
if no
-   GILState API has been used on the current thread.  Note that the main thread
-   always has such a thread-state, even if no auto-thread-state call has been
-   made on the main thread.  This is mainly a helper/diagnostic function.
-
-   .. note::
-      This function may return non-``NULL`` even when the :term:`thread state`
-      is detached.
-      Prefer :c:func:`PyThreadState_Get` or 
:c:func:`PyThreadState_GetUnchecked`
-      for most cases.
-
-   .. seealso:: :c:func:`PyThreadState_Get`
-
-.. c:function:: int PyGILState_Check()
-
-   Return ``1`` if the current thread is holding the :term:`GIL` and ``0`` 
otherwise.
-   This function can be called from any thread at any time.
-   Only if it has had its :term:`thread state <attached thread state>` 
initialized
-   via :c:func:`PyGILState_Ensure` will it return ``1``.
-   This is mainly a helper/diagnostic function.  It can be useful
-   for example in callback contexts or memory allocation functions when
-   knowing that the :term:`GIL` is locked can allow the caller to perform 
sensitive
-   actions or otherwise behave differently.
-
-   .. note::
-      If the current Python process has ever created a subinterpreter, this
-      function will *always* return ``1``. Prefer 
:c:func:`PyThreadState_GetUnchecked`
-      for most cases.
-
-   .. versionadded:: 3.4
-
-
 Low-level APIs
 --------------
 
@@ -704,7 +818,7 @@ pointer and a void pointer argument.
       possible.  If the main thread is busy executing a system call,
       *func* won't be called before the system call returns.  This
       function is generally **not** suitable for calling Python code from
-      arbitrary C threads.  Instead, use the :ref:`PyGILState API<gilstate>`.
+      arbitrary C threads.  Instead, use 
:c:func:`PyThreadState_EnsureFromView`.
 
    .. versionadded:: 3.1
 
diff --git a/Doc/data/stable_abi.dat b/Doc/data/stable_abi.dat
index 804e9c82e7818b..2d4278c9d97c85 100644
--- a/Doc/data/stable_abi.dat
+++ b/Doc/data/stable_abi.dat
@@ -369,6 +369,10 @@ func,PyImport_ImportModuleLevel,3.2,,
 func,PyImport_ImportModuleLevelObject,3.7,,
 func,PyImport_ReloadModule,3.2,,
 func,PyIndex_Check,3.8,,
+type,PyInterpreterGuard,3.15,,opaque
+func,PyInterpreterGuard_Close,3.15,,
+func,PyInterpreterGuard_FromCurrent,3.15,,
+func,PyInterpreterGuard_FromView,3.15,,
 type,PyInterpreterState,3.2,,opaque
 func,PyInterpreterState_Clear,3.2,,
 func,PyInterpreterState_Delete,3.2,,
@@ -376,6 +380,10 @@ func,PyInterpreterState_Get,3.9,,
 func,PyInterpreterState_GetDict,3.8,,
 func,PyInterpreterState_GetID,3.7,,
 func,PyInterpreterState_New,3.2,,
+type,PyInterpreterView,3.15,,opaque
+func,PyInterpreterView_Close,3.15,,
+func,PyInterpreterView_FromCurrent,3.15,,
+func,PyInterpreterView_FromMain,3.15,,
 func,PyIter_Check,3.8,,
 func,PyIter_Next,3.2,,
 func,PyIter_NextItem,3.14,,
@@ -716,14 +724,18 @@ func,PySys_SetObject,3.2,,
 func,PySys_WriteStderr,3.2,,
 func,PySys_WriteStdout,3.2,,
 type,PyThreadState,3.2,,opaque
+type,PyThreadStateToken,3.15,,opaque
 func,PyThreadState_Clear,3.2,,
 func,PyThreadState_Delete,3.2,,
+func,PyThreadState_Ensure,3.15,,
+func,PyThreadState_EnsureFromView,3.15,,
 func,PyThreadState_Get,3.2,,
 func,PyThreadState_GetDict,3.2,,
 func,PyThreadState_GetFrame,3.10,,
 func,PyThreadState_GetID,3.10,,
 func,PyThreadState_GetInterpreter,3.10,,
 func,PyThreadState_New,3.2,,
+func,PyThreadState_Release,3.15,,
 func,PyThreadState_SetAsyncExc,3.2,,
 func,PyThreadState_Swap,3.2,,
 func,PyThread_GetInfo,3.3,,
diff --git a/Doc/howto/free-threading-extensions.rst 
b/Doc/howto/free-threading-extensions.rst
index b21ed1c8f37be1..ad0578df0a2702 100644
--- a/Doc/howto/free-threading-extensions.rst
+++ b/Doc/howto/free-threading-extensions.rst
@@ -218,13 +218,15 @@ Thread State and GIL APIs
 Python provides a set of functions and macros to manage thread state and the
 GIL, such as:
 
+* :c:func:`PyThreadState_Ensure`, :c:func:`PyThreadState_EnsureFromView`,
+  and :c:func:`PyThreadState_Release`
 * :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`
 * :c:func:`PyEval_SaveThread` and :c:func:`PyEval_RestoreThread`
 * :c:macro:`Py_BEGIN_ALLOW_THREADS` and :c:macro:`Py_END_ALLOW_THREADS`
 
 These functions should still be used in the free-threaded build to manage
 thread state even when the :term:`GIL` is disabled.  For example, if you
-create a thread outside of Python, you must call :c:func:`PyGILState_Ensure`
+create a thread outside of Python, you must call :c:func:`PyThreadState_Ensure`
 before calling into the Python API to ensure that the thread has a valid
 Python thread state.
 
diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst
index 9ac231224b7b1d..50ce22c4e91f19 100644
--- a/Doc/whatsnew/3.15.rst
+++ b/Doc/whatsnew/3.15.rst
@@ -91,6 +91,7 @@ Summary -- Release highlights
 * :pep:`803`, :pep:`820 <820>`, :pep:`793 <793>`:
   :ref:`Stable ABI for free-threaded builds <whatsnew315-abi3t>` and
   related C API
+* :pep:`788`: :ref:`Protection against finalization in the C API 
<whatsnew315-c-api-interpreter-finalization>`
 * :ref:`The JIT compiler has been significantly upgraded <whatsnew315-jit>`
 * :ref:`The official Windows 64-bit binaries now use the tail-calling 
interpreter
   <whatsnew315-windows-tail-calling-interpreter>`
@@ -524,6 +525,39 @@ in :ref:`abi3-compiling`.
 .. seealso:: :pep:`803` for further details.
 
 
+.. _whatsnew315-c-api-interpreter-finalization:
+
+:pep:`788`: Protecting the C API from interpreter finalization
+--------------------------------------------------------------
+
+In the C API, :term:`interpreter finalization <interpreter shutdown>` can be
+problematic for many extensions, because :term:`attaching <attached thread
+state>` a thread state will permanently hang the thread, resulting in deadlocks
+and other spurious issues. Additionally, it has historically been impossible
+to safely check whether an interpreter is alive before using it, leading to 
crashes
+when a thread concurrently deletes an interpreter while another thread is
+trying to attach to it.
+
+There are now several new suites of APIs to circumvent these problems:
+
+* :ref:`Interpreter guards <interpreter-guards>`, which prevent an interpreter
+  from finalizing.
+* :ref:`Interpreter views <interpreter-views>`, which allow thread-safe access
+  to an interpreter that may be concurrently finalizing or deleted.
+* :ref:`New APIs <c-api-attach-detach>` to automatically attach and detach
+  thread states that come with built-in protection against finalization.
+
+In addition, APIs in the ``PyGILState`` family (most notably
+:c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`) have been
+:term:`soft deprecated`. There is **no** plan to remove them, and existing
+code will continue to work, but there will be no new ``PyGILState`` APIs
+in future versions of Python.
+
+.. seealso:: :pep:`788` for further details.
+
+(Contributed by Peter Bierma in :gh:`149101`.)
+
+
 .. _whatsnew315-improved-error-messages:
 
 Improved error messages
diff --git a/Include/cpython/pystate.h b/Include/cpython/pystate.h
index 0cb57679df331d..a9d97e47e005df 100644
--- a/Include/cpython/pystate.h
+++ b/Include/cpython/pystate.h
@@ -105,7 +105,7 @@ struct _ts {
 #  define _PyThreadState_WHENCE_INIT 1
 #  define _PyThreadState_WHENCE_FINI 2
 #  define _PyThreadState_WHENCE_THREADING 3
-#  define _PyThreadState_WHENCE_GILSTATE 4
+#  define _PyThreadState_WHENCE_C_API 4
 #  define _PyThreadState_WHENCE_EXEC 5
 #  define _PyThreadState_WHENCE_THREADING_DAEMON 6
 #endif
@@ -239,6 +239,20 @@ struct _ts {
     // structure and all share the same per-interpreter structure).
     PyStats *pystats;
 #endif
+
+    struct {
+        /* Number of nested PyThreadState_Ensure() calls on this thread state 
*/
+        Py_ssize_t counter;
+
+        /* Should this thread state be deleted upon calling
+           PyThreadState_Release() (with the counter at 1)?
+
+           This is only true for thread states created by 
PyThreadState_Ensure() */
+        int delete_on_release;
+
+        /* The interpreter guard owned by PyThreadState_EnsureFromView(), if 
any. */
+        PyInterpreterGuard *owned_guard;
+    } ensure;
 };
 
 /* other API */
diff --git a/Include/internal/pycore_interp_structs.h 
b/Include/internal/pycore_interp_structs.h
index 2d04c173e85abe..02a10e87b7e15c 100644
--- a/Include/internal/pycore_interp_structs.h
+++ b/Include/internal/pycore_interp_structs.h
@@ -834,6 +834,8 @@ struct _Py_unique_id_pool {
 
 typedef _Py_CODEUNIT *(*_PyJitEntryFuncPtr)(struct _PyExecutorObject *exec, 
_PyInterpreterFrame *frame, _PyStackRef *stack_pointer, PyThreadState *tstate);
 
+#define _PyInterpreterGuard_GUARDS_NOT_ALLOWED UINTPTR_MAX
+
 /* PyInterpreterState holds the global state for one of the runtime's
    interpreters.  Typically the initial (main) interpreter is the only one.
 
@@ -1060,6 +1062,11 @@ struct _is {
 #endif
 #endif
 
+    // The number of remaining finalization guards.
+    // If this is _PyInterpreterGuard_GUARDS_NOT_ALLOWED, then finalization
+    // guards can no longer be created.
+    uintptr_t finalization_guards;
+
     /* the initial PyInterpreterState.threads.head */
     _PyThreadStateImpl _initial_thread;
     // _initial_thread should be the last field of PyInterpreterState.
diff --git a/Include/internal/pycore_pystate.h 
b/Include/internal/pycore_pystate.h
index 189a8dde9f09ed..c9e918bceda9fc 100644
--- a/Include/internal/pycore_pystate.h
+++ b/Include/internal/pycore_pystate.h
@@ -266,6 +266,8 @@ extern int _PyOS_InterruptOccurred(PyThreadState *tstate);
     PyMutex_LockFlags(&(runtime)->interpreters.mutex, _Py_LOCK_DONT_DETACH)
 #define HEAD_UNLOCK(runtime) \
     PyMutex_Unlock(&(runtime)->interpreters.mutex)
+#define ASSERT_HEAD_IS_LOCKED(runtime) \
+    assert(PyMutex_IsLocked(&(runtime)->interpreters.mutex))
 
 #define _Py_FOR_EACH_TSTATE_UNLOCKED(interp, t) \
     for (PyThreadState *t = interp->threads.head; t; t = t->next)
@@ -338,6 +340,20 @@ _Py_RecursionLimit_GetMargin(PyThreadState *tstate)
 #endif
 }
 
+/* PEP 788 structures. */
+
+struct PyInterpreterGuard {
+    PyInterpreterState *interp;
+};
+
+struct PyInterpreterView {
+    int64_t id;
+};
+
+// Exports for '_testinternalcapi' shared extension
+PyAPI_FUNC(Py_ssize_t) _PyInterpreterState_GuardCountdown(PyInterpreterState 
*interp);
+PyAPI_FUNC(PyInterpreterState *) 
_PyInterpreterGuard_GetInterpreter(PyInterpreterGuard *guard);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/Include/pystate.h b/Include/pystate.h
index 727b8fbfffe0e6..8dad748238f4f3 100644
--- a/Include/pystate.h
+++ b/Include/pystate.h
@@ -120,6 +120,29 @@ PyAPI_FUNC(void) PyGILState_Release(PyGILState_STATE);
 PyAPI_FUNC(PyThreadState *) PyGILState_GetThisThreadState(void);
 
 
+/* PEP 788 -- Protection against interpreter finalization */
+
+#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= _Py_PACK_VERSION(3, 15)
+
+typedef struct PyInterpreterGuard PyInterpreterGuard;
+typedef struct PyInterpreterView PyInterpreterView;
+
+typedef void PyThreadStateToken;
+
+PyAPI_FUNC(PyInterpreterGuard *) PyInterpreterGuard_FromCurrent(void);
+PyAPI_FUNC(void) PyInterpreterGuard_Close(PyInterpreterGuard *guard);
+PyAPI_FUNC(PyInterpreterGuard *) PyInterpreterGuard_FromView(PyInterpreterView 
*view);
+
+PyAPI_FUNC(PyInterpreterView *) PyInterpreterView_FromCurrent(void);
+PyAPI_FUNC(void) PyInterpreterView_Close(PyInterpreterView *view);
+PyAPI_FUNC(PyInterpreterView *) PyInterpreterView_FromMain(void);
+
+PyAPI_FUNC(PyThreadStateToken *) PyThreadState_Ensure(PyInterpreterGuard 
*guard);
+PyAPI_FUNC(PyThreadStateToken *) 
PyThreadState_EnsureFromView(PyInterpreterView *view);
+PyAPI_FUNC(void) PyThreadState_Release(PyThreadStateToken *tstate);
+
+#endif
+
 #ifndef Py_LIMITED_API
 #  define Py_CPYTHON_PYSTATE_H
 #  include "cpython/pystate.h"
diff --git a/Lib/test/libregrtest/tsan.py b/Lib/test/libregrtest/tsan.py
index f1f8c8bde920ae..bacfe5e21ba0b7 100644
--- a/Lib/test/libregrtest/tsan.py
+++ b/Lib/test/libregrtest/tsan.py
@@ -29,6 +29,7 @@
     'test_threadsignals',
     'test_weakref',
     'test_free_threading',
+    'test_embed',
 ]
 
 # Tests that should be run with `--parallel-threads=N` under TSAN. These tests
diff --git a/Lib/test/test_embed.py b/Lib/test/test_embed.py
index 1087cbd0836fd8..c5ced3cc6134b9 100644
--- a/Lib/test/test_embed.py
+++ b/Lib/test/test_embed.py
@@ -1993,10 +1993,21 @@ def test_audit_run_stdin(self):
     def test_get_incomplete_frame(self):
         self.run_embedded_interpreter("test_get_incomplete_frame")
 
-
     def test_gilstate_after_finalization(self):
         self.run_embedded_interpreter("test_gilstate_after_finalization")
 
+    def test_thread_state_ensure(self):
+        self.run_embedded_interpreter("test_thread_state_ensure")
+
+    def test_main_interpreter_view(self):
+        self.run_embedded_interpreter("test_main_interpreter_view")
+
+    def test_thread_state_ensure_from_view(self):
+        self.run_embedded_interpreter("test_thread_state_ensure_from_view")
+
+    def test_concurrent_finalization_stress(self):
+        self.run_embedded_interpreter("test_concurrent_finalization_stress")
+
 
 class MiscTests(EmbeddingTestsMixin, unittest.TestCase):
     def test_unicode_id_init(self):
diff --git a/Lib/test/test_stable_abi_ctypes.py 
b/Lib/test/test_stable_abi_ctypes.py
index c20468c12b670d..ac5c4296c663d0 100644
--- a/Lib/test/test_stable_abi_ctypes.py
+++ b/Lib/test/test_stable_abi_ctypes.py
@@ -369,12 +369,18 @@ def test_windows_feature_macros(self):
     "PyImport_ImportModuleNoBlock",
     "PyImport_ReloadModule",
     "PyIndex_Check",
+    "PyInterpreterGuard_Close",
+    "PyInterpreterGuard_FromCurrent",
+    "PyInterpreterGuard_FromView",
     "PyInterpreterState_Clear",
     "PyInterpreterState_Delete",
     "PyInterpreterState_Get",
     "PyInterpreterState_GetDict",
     "PyInterpreterState_GetID",
     "PyInterpreterState_New",
+    "PyInterpreterView_Close",
+    "PyInterpreterView_FromCurrent",
+    "PyInterpreterView_FromMain",
     "PyIter_Check",
     "PyIter_Next",
     "PyIter_NextItem",
@@ -695,12 +701,15 @@ def test_windows_feature_macros(self):
     "PyThreadState_Clear",
     "PyThreadState_Delete",
     "PyThreadState_DeleteCurrent",
+    "PyThreadState_Ensure",
+    "PyThreadState_EnsureFromView",
     "PyThreadState_Get",
     "PyThreadState_GetDict",
     "PyThreadState_GetFrame",
     "PyThreadState_GetID",
     "PyThreadState_GetInterpreter",
     "PyThreadState_New",
+    "PyThreadState_Release",
     "PyThreadState_SetAsyncExc",
     "PyThreadState_Swap",
     "PyThread_GetInfo",
diff --git 
a/Misc/NEWS.d/next/C_API/2026-04-28-17-43-12.gh-issue-149101.HTuHTb.rst 
b/Misc/NEWS.d/next/C_API/2026-04-28-17-43-12.gh-issue-149101.HTuHTb.rst
new file mode 100644
index 00000000000000..9bcb835c19f09c
--- /dev/null
+++ b/Misc/NEWS.d/next/C_API/2026-04-28-17-43-12.gh-issue-149101.HTuHTb.rst
@@ -0,0 +1 @@
+Implement :pep:`788`.
diff --git a/Misc/stable_abi.toml b/Misc/stable_abi.toml
index 6d63a6796b1739..8fd7aba09241e6 100644
--- a/Misc/stable_abi.toml
+++ b/Misc/stable_abi.toml
@@ -2799,5 +2799,35 @@
     # (The definition of 'full-abi' was clarified when this entry was added.)
     struct_abi_kind = 'full-abi'
 
+# PEP 788 finalization protection
+
+[struct.PyInterpreterGuard]
+    added = '3.15'
+    struct_abi_kind = 'opaque'
+[function.PyInterpreterGuard_FromCurrent]
+    added = '3.15'
+[function.PyInterpreterGuard_FromView]
+    added = '3.15'
+[function.PyInterpreterGuard_Close]
+    added = '3.15'
+[struct.PyInterpreterView]
+    added = '3.15'
+    struct_abi_kind = 'opaque'
+[function.PyInterpreterView_FromCurrent]
+    added = '3.15'
+[function.PyInterpreterView_FromMain]
+    added = '3.15'
+[function.PyInterpreterView_Close]
+    added = '3.15'
+[function.PyThreadState_Ensure]
+    added = '3.15'
+[function.PyThreadState_EnsureFromView]
+    added = '3.15'
+[function.PyThreadState_Release]
+    added = '3.15'
+[struct.PyThreadStateToken]
+    added = '3.15'
+    struct_abi_kind = 'opaque'
+
 [function.PyObject_CallFinalizerFromDealloc]
     added = '3.15'
diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 3ebe4ceea6a72e..be5ad3e9efa104 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2606,6 +2606,334 @@ create_managed_weakref_nogc_type(PyObject *self, 
PyObject *Py_UNUSED(args))
     return PyType_FromSpec(&ManagedWeakrefNoGC_spec);
 }
 
+static void
+test_interp_guards_common(void)
+{
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    assert(guard != NULL);
+
+    PyInterpreterGuard *guard_2 = PyInterpreterGuard_FromCurrent();
+    assert(guard_2 != NULL);
+
+    // We can close the guards in any order
+    PyInterpreterGuard_Close(guard_2);
+    PyInterpreterGuard_Close(guard);
+}
+
+static PyObject *
+test_interpreter_guards(PyObject *self, PyObject *unused)
+{
+    // Test the main interpreter
+    test_interp_guards_common();
+
+    // Test a (legacy) subinterpreter
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+    PyThreadState *interp_tstate = Py_NewInterpreter();
+    // Note: For these tests, we don't bother adding error paths, because
+    // there's no realistic case where interpreter creation would fail here.
+    assert(interp_tstate != NULL);
+    test_interp_guards_common();
+    Py_EndInterpreter(interp_tstate);
+
+    // Test an isolated subinterpreter
+    PyInterpreterConfig config = {
+        .gil = PyInterpreterConfig_OWN_GIL,
+        .check_multi_interp_extensions = 1
+    };
+
+    PyThreadState *isolated_interp_tstate;
+    PyStatus status = Py_NewInterpreterFromConfig(&isolated_interp_tstate, 
&config);
+    assert(!PyStatus_Exception(status));
+
+    test_interp_guards_common();
+    Py_EndInterpreter(isolated_interp_tstate);
+    PyThreadState_Swap(save_tstate);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_thread_state_ensure_nested(PyObject *self, PyObject *unused)
+{
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    assert(guard != NULL);
+
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+    assert(PyGILState_GetThisThreadState() == save_tstate);
+    PyThreadStateToken *tokens[10];
+
+    for (int i = 0; i < 10; ++i) {
+        // Test reactivation of the detached tstate.
+        tokens[i] = PyThreadState_Ensure(guard);
+        assert(tokens[i] != NULL);
+
+        // No new thread state should've been created.
+        assert(PyThreadState_Get() == save_tstate);
+        PyThreadState_Release(tokens[i]);
+    }
+
+    assert(PyThreadState_GetUnchecked() == NULL);
+
+    // Similarly, test ensuring with deep nesting and *then* releasing.
+    // If the (detached) gilstate matches the interpreter, then it shouldn't
+    // create a new thread state.
+    for (int i = 0; i < 10; ++i) {
+        tokens[i] = PyThreadState_Ensure(guard);
+        assert(tokens[i] != NULL);
+        assert(PyThreadState_Get() == save_tstate);
+    }
+
+    for (int i = 9; i >= 0; --i) {
+        assert(PyThreadState_Get() == save_tstate);
+        PyThreadState_Release(tokens[i]);
+    }
+
+    assert(PyThreadState_GetUnchecked() == NULL);
+    PyInterpreterGuard_Close(guard);
+    PyThreadState_Swap(save_tstate);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_thread_state_ensure_crossinterp(PyObject *self, PyObject *unused)
+{
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+    PyThreadState *interp_tstate = Py_NewInterpreter();
+    assert(interp_tstate != NULL);
+
+    /* This should create a new thread state for the calling interpreter, *not*
+       reactivate the old one. In a real-world scenario, this would arise in
+       something like this:
+
+       def some_func():
+           import something
+           # This re-enters the main interpreter, but we
+           # shouldn't have access to prior thread-locals.
+           something.call_something()
+
+       interp = interpreters.create()
+       interp.exec(some_func)
+       */
+    PyThreadStateToken *token = PyThreadState_Ensure(guard);
+    assert(token != NULL);
+
+    PyThreadState *ensured_tstate = PyThreadState_Get();
+    assert(ensured_tstate != save_tstate);
+    assert(PyGILState_GetThisThreadState() == ensured_tstate);
+
+    // Now though, we should reactivate the thread state
+    PyThreadStateToken *other_token = PyThreadState_Ensure(guard);
+    assert(other_token != NULL);
+    assert(PyThreadState_Get() == ensured_tstate);
+
+    PyThreadState_Release(other_token);
+
+    // Ensure that we're restoring the prior thread state
+    PyThreadState_Release(token);
+    assert(PyThreadState_Get() == interp_tstate);
+    assert(PyGILState_GetThisThreadState() == interp_tstate);
+
+    PyThreadState_Swap(interp_tstate);
+    Py_EndInterpreter(interp_tstate);
+
+    PyInterpreterGuard_Close(guard);
+    PyThreadState_Swap(save_tstate);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_interp_view_after_shutdown(PyObject *self, PyObject *unused)
+{
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+    PyThreadState *interp_tstate = Py_NewInterpreter();
+    if (interp_tstate == NULL) {
+        PyThreadState_Swap(save_tstate);
+        return PyErr_NoMemory();
+    }
+
+    PyInterpreterView *view = PyInterpreterView_FromCurrent();
+    if (view == NULL) {
+        Py_EndInterpreter(interp_tstate);
+        PyThreadState_Swap(save_tstate);
+        return PyErr_NoMemory();
+    }
+
+    // As a sanity check, ensure that the view actually works
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromView(view);
+    PyInterpreterGuard_Close(guard);
+
+    // Now, destroy the interpreter and try to acquire a lock from a view.
+    // It should fail.
+    Py_EndInterpreter(interp_tstate);
+    guard = PyInterpreterGuard_FromView(view);
+    assert(guard == NULL);
+
+    PyThreadState_Swap(save_tstate);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_thread_state_ensure_view(PyObject *self, PyObject *unused)
+{
+    // For simplicity's sake, we assume that functions won't fail due to being
+    // out of memory.
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+    PyThreadState *interp_tstate = Py_NewInterpreter();
+    assert(interp_tstate != NULL);
+    assert(PyInterpreterState_Get() == 
PyThreadState_GetInterpreter(interp_tstate));
+
+    PyInterpreterView *main_view = PyInterpreterView_FromMain();
+    assert(main_view != NULL);
+
+    PyInterpreterView *view = PyInterpreterView_FromCurrent();
+    assert(view != NULL);
+
+    Py_BEGIN_ALLOW_THREADS;
+    PyThreadStateToken *token = PyThreadState_EnsureFromView(view);
+    assert(token != NULL);
+    assert(PyThreadState_Get() == interp_tstate);
+
+    // Test a nested call
+    PyThreadStateToken *token2 = PyThreadState_EnsureFromView(view);
+    assert(PyThreadState_Get() == interp_tstate);
+
+    // We're in a new interpreter now. PyThreadState_EnsureFromView() should
+    // now create a new thread state.
+    PyThreadStateToken *main_token = PyThreadState_EnsureFromView(main_view);
+    assert(main_token == (PyThreadStateToken*)interp_tstate); // The old 
thread state
+    assert(PyInterpreterState_Get() == PyInterpreterState_Main());
+
+    // Going back to the old interpreter should create a new thread state 
again.
+    PyThreadStateToken *token3 = PyThreadState_EnsureFromView(view);
+    assert(PyInterpreterState_Get() == 
PyThreadState_GetInterpreter(interp_tstate));
+    assert(PyThreadState_Get() != interp_tstate);
+    PyThreadState_Release(token3);
+    PyThreadState_Release(main_token);
+
+    // We're back in the original interpreter. PyThreadState_EnsureFromView() 
should
+    // no longer create a new thread state.
+    assert(PyThreadState_Get() == interp_tstate);
+    PyThreadStateToken *token4 = PyThreadState_EnsureFromView(view);
+    assert(PyThreadState_Get() == interp_tstate);
+    PyThreadState_Release(token4);
+    PyThreadState_Release(token2);
+    PyThreadState_Release(token);
+    assert(PyThreadState_GetUnchecked() == NULL);
+    Py_END_ALLOW_THREADS;
+
+    assert(PyThreadState_Get() == interp_tstate);
+    PyInterpreterView_Close(view);
+    PyInterpreterView_Close(main_view);
+    Py_EndInterpreter(interp_tstate);
+    PyThreadState_Swap(save_tstate);
+
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_thread_state_ensure_detachment(PyObject *self, PyObject *unused)
+{
+    PyThreadState *before = PyThreadState_Get();
+    assert(before != NULL);
+
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    assert(guard != NULL);
+
+    PyThreadStateToken *token = PyThreadState_Ensure(guard);
+    assert(token != NULL);
+    /* Ensure took the fast path; tstate is unchanged. */
+    assert(PyThreadState_Get() == before);
+
+    PyThreadState_Release(token);
+
+    PyThreadState *after = PyThreadState_GetUnchecked();
+    assert(after != NULL);
+
+    PyInterpreterGuard_Close(guard);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_thread_state_ensure_detached_gilstate(PyObject *self, PyObject *unused)
+{
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    PyThreadState *gilstate = PyGILState_GetThisThreadState();
+
+    PyThreadStateToken *token1 = PyThreadState_Ensure(guard);
+    assert(PyThreadState_Get() == gilstate);
+
+    Py_BEGIN_ALLOW_THREADS
+    assert(PyThreadState_GetUnchecked() == NULL);
+    PyThreadStateToken *token2 = PyThreadState_Ensure(guard);
+    assert(PyThreadState_Get() == gilstate);
+    PyThreadState_Release(token2);
+    assert(PyThreadState_GetUnchecked() == NULL);
+    Py_END_ALLOW_THREADS
+    assert(PyThreadState_Get() == gilstate);
+
+    PyThreadState_Release(token1);
+    assert(PyThreadState_Get() == gilstate);
+
+    PyInterpreterGuard_Close(guard);
+
+    Py_RETURN_NONE;
+}
+
+/* A capsule destructor that calls Ensure/Release while the tstate is being
+ * cleared by PyThreadState_Release. */
+static void
+tstate_ensure_capsule_destructor(PyObject *capsule)
+{
+    assert(capsule != NULL);
+    PyInterpreterGuard *guard = PyCapsule_GetPointer(capsule, "x");
+    PyThreadStateToken *token = PyThreadState_Ensure(guard);
+    assert(token != NULL);
+    PyThreadState_Release(token);
+}
+
+static PyObject *
+test_thread_state_release_with_destructor(PyObject *self, PyObject *unused)
+{
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    assert(guard != NULL);
+
+    // We need to use a fresh thread state in order to control the lifetime of
+    // it. If we used the current thread state, it wouldn't be cleared until
+    // the end of the program, which is after the guard has been closed.
+    PyThreadState *fresh_tstate = PyThreadState_New(PyInterpreterState_Get());
+    assert(fresh_tstate != NULL);
+
+    PyThreadState *save_tstate = PyThreadState_Swap(fresh_tstate);
+    assert(save_tstate != NULL);
+
+    /* Triggers fresh tstate path */
+    PyThreadStateToken *token = PyThreadState_Ensure(guard);
+    assert(token != NULL);
+
+    /* Stash a capsule whose destructor will run during PyThreadState_Clear. */
+    PyObject *capsule = PyCapsule_New(guard, "x", 
tstate_ensure_capsule_destructor);
+    assert(capsule != NULL);
+
+    /* We need to put it somewhere it gets cleaned up at PyThreadState_Clear.
+     * tstate->dict is cleared during PyThreadState_Clear. */
+    PyObject *dict = PyThreadState_GetDict();
+    assert(dict != NULL);
+    int res = PyDict_SetItemString(dict, "key", capsule);
+    assert(res == 0);
+    Py_DECREF(capsule);
+
+    PyThreadState_Release(token);
+
+    // This will trigger the destructor
+    PyThreadState_Clear(fresh_tstate);
+    PyThreadState_DeleteCurrent();
+
+    PyInterpreterGuard_Close(guard);
+    PyThreadState_Swap(save_tstate);
+
+    Py_RETURN_NONE;
+}
+
 
 static PyObject*
 test_soft_deprecated_macros(PyObject *Py_UNUSED(self), PyObject 
*Py_UNUSED(args))
@@ -2740,6 +3068,14 @@ static PyMethodDef TestMethods[] = {
     {"create_managed_weakref_nogc_type",
         create_managed_weakref_nogc_type, METH_NOARGS},
     {"test_soft_deprecated_macros", test_soft_deprecated_macros, METH_NOARGS},
+    {"test_interpreter_guards", test_interpreter_guards, METH_NOARGS},
+    {"test_thread_state_ensure_nested", test_thread_state_ensure_nested, 
METH_NOARGS},
+    {"test_thread_state_ensure_crossinterp", 
test_thread_state_ensure_crossinterp, METH_NOARGS},
+    {"test_interp_view_after_shutdown", test_interp_view_after_shutdown, 
METH_NOARGS},
+    {"test_thread_state_ensure_view", test_thread_state_ensure_view, 
METH_NOARGS},
+    {"test_thread_state_ensure_detachment", 
test_thread_state_ensure_detachment, METH_NOARGS},
+    {"test_thread_state_ensure_detached_gilstate", 
test_thread_state_ensure_detached_gilstate, METH_NOARGS},
+    {"test_thread_state_release_with_destructor", 
test_thread_state_release_with_destructor, METH_NOARGS},
     {NULL, NULL} /* sentinel */
 };
 
diff --git a/Modules/_testinternalcapi.c b/Modules/_testinternalcapi.c
index 73451b5117fa8c..c0a7680388e4a7 100644
--- a/Modules/_testinternalcapi.c
+++ b/Modules/_testinternalcapi.c
@@ -3059,6 +3059,69 @@ test_threadstate_set_stack_protection(PyObject *self, 
PyObject *Py_UNUSED(args))
     Py_RETURN_NONE;
 }
 
+#define NUM_GUARDS 100
+
+static PyObject *
+test_interp_guard_countdown(PyObject *self, PyObject *unused)
+{
+    PyThreadState *save_tstate = PyThreadState_Swap(NULL);
+
+    // This test assumes that the interpreter has no guards active.
+    // While this is currently true for the main interpreter as of writing,
+    // this won't necessarily be true in the future. For the sake of
+    // maintainance, we create a new interpreter to be sure that there aren't
+    // any other guards.
+    PyThreadState *interp_tstate = Py_NewInterpreter();
+    assert(interp_tstate != NULL);
+    PyInterpreterState *interp = PyInterpreterState_Get();
+    assert(_PyInterpreterState_GuardCountdown(interp) == 0);
+
+    PyInterpreterGuard *guards[NUM_GUARDS];
+    for (int i = 0; i < NUM_GUARDS; ++i) {
+        guards[i] = PyInterpreterGuard_FromCurrent();
+        assert(guards[i] != 0);
+        assert(_PyInterpreterState_GuardCountdown(interp) == i + 1);
+    }
+
+    for (int i = 0; i < NUM_GUARDS; ++i) {
+        PyInterpreterGuard_Close(guards[i]);
+        assert(_PyInterpreterState_GuardCountdown(interp) == (NUM_GUARDS - i - 
1));
+    }
+
+    Py_EndInterpreter(interp_tstate);
+    PyThreadState_Swap(save_tstate);
+    Py_RETURN_NONE;
+}
+
+static PyObject *
+test_interp_view_countdown(PyObject *self, PyObject *unused)
+{
+    PyInterpreterState *interp = PyInterpreterState_Get();
+    PyInterpreterView *view = PyInterpreterView_FromCurrent();
+    if (view == NULL) {
+        return NULL;
+    }
+    assert(_PyInterpreterState_GuardCountdown(interp) == 0);
+
+    PyInterpreterGuard *guards[NUM_GUARDS];
+
+    for (int i = 0; i < NUM_GUARDS; ++i) {
+        guards[i] = PyInterpreterGuard_FromView(view);
+        assert(guards[i] != 0);
+        assert(_PyInterpreterGuard_GetInterpreter(guards[i]) == interp);
+        assert(_PyInterpreterState_GuardCountdown(interp) == i + 1);
+    }
+
+    for (int i = 0; i < NUM_GUARDS; ++i) {
+        PyInterpreterGuard_Close(guards[i]);
+        assert(_PyInterpreterState_GuardCountdown(interp) == (NUM_GUARDS - i - 
1));
+    }
+
+    PyInterpreterView_Close(view);
+    Py_RETURN_NONE;
+}
+
+#undef NUM_LOCKS
 
 static PyObject *
 _pyerr_setkeyerror(PyObject *self, PyObject *arg)
@@ -3073,6 +3136,52 @@ _pyerr_setkeyerror(PyObject *self, PyObject *arg)
     return NULL;
 }
 
+static PyObject *
+test_thread_state_ensure_from_view_interp_switch(PyObject *self, PyObject 
*unused)
+{
+    /* The main tstate is already attached and was NOT created by
+     * PyThreadState_Ensure, so delete_on_release == 0. */
+    PyInterpreterState *interp = _PyInterpreterState_GET();
+    assert(interp != NULL);
+    PyInterpreterView *view = PyInterpreterView_FromCurrent();
+    assert(view != NULL);
+
+    /* First Ensure/Release pair on this pre-existing tstate. */
+    assert(_PyThreadState_GET() != NULL);
+    PyThreadStateToken *t1 = PyThreadState_EnsureFromView(view);
+    assert(t1 != NULL);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 1);
+    PyThreadState_Release(t1);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 0);
+    assert(_PyThreadState_GET() != NULL);
+
+    /* tstate->ensure.owned_guard now points at the freed guard. */
+
+    /* Re-attach: Bug B detaches us as a side effect (separate repro). */
+    PyThreadState *save = PyThreadState_Swap(NULL);
+
+    PyThreadStateToken *t2 = PyThreadState_EnsureFromView(view);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 1);
+    assert(t2 != NULL);
+    PyThreadState_Release(t2);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 0);
+    assert(_PyThreadState_GET() == NULL);
+
+    PyThreadState_Swap(save);
+
+    /* In a release build (no assertion) the second Ensure silently
+     * skipped storing its guard and Release decremented the global
+     * counter from 0, wrapping it to GUARDS_NOT_ALLOWED.  All future
+     * guard acquisitions then fail: */
+    PyInterpreterGuard *g = PyInterpreterGuard_FromCurrent();
+    assert(g != NULL);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 1);
+    PyInterpreterGuard_Close(g);
+    assert(_PyInterpreterState_GuardCountdown(interp) == 0);
+
+    PyInterpreterView_Close(view);
+    Py_RETURN_NONE;
+}
 
 static PyMethodDef module_functions[] = {
     {"get_configs", get_configs, METH_NOARGS},
@@ -3198,6 +3307,9 @@ static PyMethodDef module_functions[] = {
     {"test_threadstate_set_stack_protection",
      test_threadstate_set_stack_protection, METH_NOARGS},
     {"_pyerr_setkeyerror", _pyerr_setkeyerror, METH_O},
+    {"test_interp_guard_countdown", test_interp_guard_countdown, METH_NOARGS},
+    {"test_interp_view_countdown", test_interp_view_countdown, METH_NOARGS},
+    {"test_thread_state_ensure_from_view_interp_switch", 
test_thread_state_ensure_from_view_interp_switch, METH_NOARGS},
     {NULL, NULL} /* sentinel */
 };
 
diff --git a/PC/python3dll.c b/PC/python3dll.c
index 3f29382f9b0b34..e0be9d65a93cda 100755
--- a/PC/python3dll.c
+++ b/PC/python3dll.c
@@ -330,12 +330,18 @@ EXPORT_FUNC(PyImport_ImportModuleLevelObject)
 EXPORT_FUNC(PyImport_ImportModuleNoBlock)
 EXPORT_FUNC(PyImport_ReloadModule)
 EXPORT_FUNC(PyIndex_Check)
+EXPORT_FUNC(PyInterpreterGuard_Close)
+EXPORT_FUNC(PyInterpreterGuard_FromCurrent)
+EXPORT_FUNC(PyInterpreterGuard_FromView)
 EXPORT_FUNC(PyInterpreterState_Clear)
 EXPORT_FUNC(PyInterpreterState_Delete)
 EXPORT_FUNC(PyInterpreterState_Get)
 EXPORT_FUNC(PyInterpreterState_GetDict)
 EXPORT_FUNC(PyInterpreterState_GetID)
 EXPORT_FUNC(PyInterpreterState_New)
+EXPORT_FUNC(PyInterpreterView_Close)
+EXPORT_FUNC(PyInterpreterView_FromCurrent)
+EXPORT_FUNC(PyInterpreterView_FromMain)
 EXPORT_FUNC(PyIter_Check)
 EXPORT_FUNC(PyIter_Next)
 EXPORT_FUNC(PyIter_NextItem)
@@ -661,12 +667,15 @@ EXPORT_FUNC(PyThread_tss_set)
 EXPORT_FUNC(PyThreadState_Clear)
 EXPORT_FUNC(PyThreadState_Delete)
 EXPORT_FUNC(PyThreadState_DeleteCurrent)
+EXPORT_FUNC(PyThreadState_Ensure)
+EXPORT_FUNC(PyThreadState_EnsureFromView)
 EXPORT_FUNC(PyThreadState_Get)
 EXPORT_FUNC(PyThreadState_GetDict)
 EXPORT_FUNC(PyThreadState_GetFrame)
 EXPORT_FUNC(PyThreadState_GetID)
 EXPORT_FUNC(PyThreadState_GetInterpreter)
 EXPORT_FUNC(PyThreadState_New)
+EXPORT_FUNC(PyThreadState_Release)
 EXPORT_FUNC(PyThreadState_SetAsyncExc)
 EXPORT_FUNC(PyThreadState_Swap)
 EXPORT_FUNC(PyTraceBack_Here)
diff --git a/Programs/_testembed.c b/Programs/_testembed.c
index 285f4f091b2f7a..278984ddb17c1a 100644
--- a/Programs/_testembed.c
+++ b/Programs/_testembed.c
@@ -10,6 +10,7 @@
 #include "pycore_runtime.h"       // _PyRuntime
 #include "pycore_lock.h"          // PyEvent
 #include "pycore_pythread.h"      // PyThread_start_joinable_thread()
+#include "pycore_pystate.h"       // _PyInterpreterState_GuardCountdown
 #include "pycore_import.h"        // _PyImport_FrozenBootstrap
 #include <inttypes.h>
 #include <stdio.h>
@@ -2670,6 +2671,214 @@ test_gilstate_after_finalization(void)
     return PyThread_detach_thread(handle);
 }
 
+
+const char *THREAD_CODE = \
+    "import time\n"
+    "time.sleep(0.2)\n"
+    "def fib(n):\n"
+    "  if n <= 1:\n"
+    "    return n\n"
+    "  else:\n"
+    "    return fib(n - 1) + fib(n - 2)\n"
+    "fib(10)";
+
+typedef struct {
+    void *argument;
+    int done;
+    PyEvent event;
+} ThreadData;
+
+static void
+do_tstate_ensure(void *arg)
+{
+    ThreadData *data = (ThreadData *)arg;
+    PyThreadStateToken *tokens[4];
+    PyInterpreterGuard *guard = data->argument;
+    tokens[0] = PyThreadState_Ensure(guard);
+    tokens[1] = PyThreadState_Ensure(guard);
+    tokens[2] = PyThreadState_Ensure(guard);
+    PyGILState_STATE gstate = PyGILState_Ensure();
+    tokens[3] = PyThreadState_Ensure(guard);
+    assert(tokens[0] != NULL);
+    assert(tokens[1] != NULL);
+    assert(tokens[2] != NULL);
+    assert(tokens[3] != NULL);
+    int res = PyRun_SimpleString(THREAD_CODE);
+    assert(res == 0);
+    PyThreadState_Release(tokens[3]);
+    PyGILState_Release(gstate);
+    PyThreadState_Release(tokens[2]);
+    PyThreadState_Release(tokens[1]);
+    PyThreadState_Release(tokens[0]);
+    PyInterpreterGuard_Close(guard);
+    _Py_atomic_store_int(&data->done, 1);
+}
+
+static int
+test_thread_state_ensure(void)
+{
+    _testembed_initialize();
+    assert(_PyInterpreterState_GuardCountdown(_PyInterpreterState_GET()) == 0);
+    PyThread_handle_t handle;
+    PyThread_ident_t ident;
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromCurrent();
+    assert(guard != NULL);
+    ThreadData data = { guard };
+    if (PyThread_start_joinable_thread(do_tstate_ensure, &data,
+                                       &ident, &handle) < 0) {
+        PyInterpreterGuard_Close(guard);
+        return -1;
+    }
+    // We hold an interpreter guard, so we don't
+    // have to worry about the interpreter shutting down before
+    // we finalize.
+    Py_Finalize();
+    assert(_Py_atomic_load_int(&data.done) == 1);
+    PyThread_join_thread(handle);
+    return 0;
+}
+
+static int
+test_main_interpreter_view(void)
+{
+    PyInterpreterView *view = PyInterpreterView_FromMain();
+    assert(view != NULL);
+    // These should fail -- the main interpreter is not available yet.
+    assert(PyInterpreterGuard_FromView(view) == NULL);
+    assert(PyThreadState_EnsureFromView(view) == NULL);
+
+    _testembed_initialize();
+    assert(_PyInterpreterState_GuardCountdown(_PyInterpreterState_GET()) == 0);
+    // Main interpreter is initialized and ready at this point.
+
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromView(view);
+    assert(guard != NULL);
+    PyInterpreterGuard_Close(guard);
+
+    Py_Finalize();
+
+    // We shouldn't be able to get locks for the interpreter now
+    guard = PyInterpreterGuard_FromView(view);
+    assert(guard == NULL);
+
+    PyInterpreterView_Close(view);
+
+    return 0;
+}
+
+static void
+do_tstate_ensure_from_view(void *arg)
+{
+    ThreadData *data = (ThreadData *)arg;
+    PyInterpreterView *view = data->argument;
+    assert(view != NULL);
+    PyThreadStateToken *token = PyThreadState_EnsureFromView(view);
+    assert(token != NULL);
+    _PyEvent_Notify(&data->event);
+    int res = PyRun_SimpleString(THREAD_CODE);
+    assert(res == 0);
+    _Py_atomic_store_int(&data->done, 1);
+    PyThreadState_Release(token);
+}
+
+static int
+test_thread_state_ensure_from_view(void)
+{
+    _testembed_initialize();
+    assert(_PyInterpreterState_GuardCountdown(_PyInterpreterState_GET()) == 0);
+    PyThread_handle_t handle;
+    PyThread_ident_t ident;
+    PyInterpreterView *view = PyInterpreterView_FromCurrent();
+    assert(view != NULL);
+
+    ThreadData data = { view };
+    if (PyThread_start_joinable_thread(do_tstate_ensure_from_view, &data,
+                                       &ident, &handle) < 0) {
+        PyInterpreterView_Close(view);
+        return -1;
+    }
+
+    PyEvent_Wait(&data.event);
+    Py_Finalize();
+    assert(_Py_atomic_load_int(&data.done) == 1);
+    PyThread_join_thread(handle);
+    return 0;
+}
+
+#define NUM_THREADS 4
+
+static void
+stress_func(void *arg)
+{
+    PyInterpreterGuard *guard = (PyInterpreterGuard *)arg;
+
+    for (int i = 0; i < 1000; ++i) {
+        assert(guard != NULL);
+        PyThreadStateToken *token = PyThreadState_Ensure(guard);
+        assert(token != NULL);
+
+        PyGILState_STATE gstate = PyGILState_Ensure();
+
+        PyInterpreterView *view = PyInterpreterView_FromCurrent();
+        assert(view != NULL);
+
+        PyThreadStateToken *token2 = PyThreadState_EnsureFromView(view);
+        assert(token2 != NULL);
+        PyThreadState_Release(token2);
+
+        PyGILState_Release(gstate);
+
+        PyThreadState_Release(token);
+
+        PyInterpreterGuard_Close(guard);
+
+        guard = PyInterpreterGuard_FromView(view);
+        PyInterpreterView_Close(view);
+
+        if (guard == NULL) {
+            // The interpreter is shutting down. Bail out now.
+            return;
+        }
+    }
+
+    PyInterpreterGuard_Close(guard);
+}
+
+static int
+test_concurrent_finalization_stress(void)
+{
+    for (int j = 0; j < 50; ++j) {
+        _testembed_initialize();
+        assert(_PyInterpreterState_GuardCountdown(_PyInterpreterState_GET()) 
== 0);
+        PyThread_handle_t handles[NUM_THREADS];
+        PyThread_ident_t idents[NUM_THREADS];
+        PyInterpreterGuard *guards[NUM_THREADS];
+
+        for (int i = 0; i < NUM_THREADS; ++i) {
+            guards[i] = PyInterpreterGuard_FromCurrent();
+            assert(guards[i] != NULL);
+            if (PyThread_start_joinable_thread(stress_func, guards[i], 
&idents[i], &handles[i]) < 0) {
+                for (int x = 0; x < i; ++x) {
+                    PyInterpreterGuard_Close(guards[x]);
+                    PyThread_detach_thread(handles[x]);
+                }
+                return -1;
+            }
+        }
+
+        Py_Finalize();
+
+        for (int i = 0; i < NUM_THREADS; ++i) {
+            PyThread_join_thread(handles[i]);
+        }
+    }
+
+    return 0;
+}
+
+#undef NUM_THREADS
+
+
 /* *********************************************************
  * List of test cases and the function that implements it.
  *
@@ -2764,6 +2973,10 @@ static struct TestCase TestCases[] = {
     {"test_create_module_from_initfunc", test_create_module_from_initfunc},
     {"test_inittab_submodule_multiphase", test_inittab_submodule_multiphase},
     {"test_inittab_submodule_singlephase", test_inittab_submodule_singlephase},
+    {"test_thread_state_ensure", test_thread_state_ensure},
+    {"test_main_interpreter_view", test_main_interpreter_view},
+    {"test_thread_state_ensure_from_view", test_thread_state_ensure_from_view},
+    {"test_concurrent_finalization_stress", 
test_concurrent_finalization_stress},
     {NULL, NULL}
 };
 
diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
index 8f31756f3df840..46579a45f4cc39 100644
--- a/Python/pylifecycle.c
+++ b/Python/pylifecycle.c
@@ -19,6 +19,7 @@
 #include "pycore_object.h"        // _PyDebug_PrintTotalRefs()
 #include "pycore_obmalloc.h"      // _PyMem_init_obmalloc()
 #include "pycore_optimizer.h"     // _Py_Executors_InvalidateAll
+#include "pycore_parking_lot.h"   // _PyParkingLot
 #include "pycore_pathconfig.h"    // _PyPathConfig_UpdateGlobal()
 #include "pycore_pyerrors.h"      // _PyErr_Occurred()
 #include "pycore_pylifecycle.h"   // _PyErr_Print()
@@ -2229,15 +2230,13 @@ interp_has_threads(PyInterpreterState *interp)
     /* This needs to check for non-daemon threads only, otherwise we get stuck
      * in an infinite loop. */
     assert(interp != NULL);
-    ASSERT_WORLD_STOPPED(interp);
+    ASSERT_HEAD_IS_LOCKED(interp->runtime);
     assert(interp->threads.head != NULL);
     if (interp->threads.head->next == NULL) {
         // No other threads active, easy way out.
         return 0;
     }
 
-    // We don't have to worry about locking this because the
-    // world is stopped.
     _Py_FOR_EACH_TSTATE_UNLOCKED(interp, tstate) {
         if (tstate->_whence == _PyThreadState_WHENCE_THREADING) {
             return 1;
@@ -2269,9 +2268,7 @@ static int
 runtime_has_subinterpreters(_PyRuntimeState *runtime)
 {
     assert(runtime != NULL);
-    HEAD_LOCK(runtime);
     PyInterpreterState *interp = runtime->interpreters.head;
-    HEAD_UNLOCK(runtime);
     return interp->next != NULL;
 }
 
@@ -2280,6 +2277,7 @@ make_pre_finalization_calls(PyThreadState *tstate, int 
subinterpreters)
 {
     assert(tstate != NULL);
     PyInterpreterState *interp = tstate->interp;
+    assert(_Py_atomic_load_uintptr(&interp->finalization_guards) != 
_PyInterpreterGuard_GUARDS_NOT_ALLOWED);
     /* Each of these functions can start one another, e.g. a pending call
      * could start a thread or vice versa. To ensure that we properly clean
      * call everything, we run these in a loop until none of them run 
anything. */
@@ -2306,41 +2304,78 @@ make_pre_finalization_calls(PyThreadState *tstate, int 
subinterpreters)
 
         if (subinterpreters) {
             /* Clean up any lingering subinterpreters.
-
-            Two preconditions need to be met here:
-
-                - This has to happen before _PyRuntimeState_SetFinalizing is
-                called, or else threads might get prematurely blocked.
-                - The world must not be stopped, as finalizers can run.
-            */
+             * Two preconditions need to be met here:
+             * 1. This has to happen before _PyRuntimeState_SetFinalizing is
+             *    called, or else threads might get prematurely blocked.
+             * 2. The world must not be stopped, as finalizers can run.
+             */
             finalize_subinterpreters();
         }
 
+        // This is used as a throttle to prevent constant spinning while
+        // on finalization guards.
+        for (;;) {
+            uintptr_t num_guards = 
_Py_atomic_load_uintptr(&interp->finalization_guards);
+            if (num_guards == 0) {
+                break;
+            }
+
+            int ret = _PyParkingLot_Park(&interp->finalization_guards,
+                                         &num_guards, sizeof(num_guards), -1,
+                                         NULL, /*detach=*/1);
+            if (ret == Py_PARK_OK) {
+                break;
+            }
+            else if (ret == Py_PARK_INTR) {
+                if (PyErr_CheckSignals() < 0) {
+                    int fatal = 
PyErr_ExceptionMatches(PyExc_KeyboardInterrupt);
+                    PyErr_FormatUnraisable("Exception ignored while waiting on 
finalization guards");
+                    if (fatal) {
+                        fputs("Interrupted while waiting on finalization 
guards\n", stderr);
+                        exit(1);
+                    }
+                }
+                assert(!PyErr_Occurred());
+            }
+            else {
+                assert(ret == Py_PARK_AGAIN);
+            }
+        }
 
         /* Stop the world to prevent other threads from creating threads or
          * atexit callbacks. On the default build, this is simply locked by
          * the GIL. For pending calls, we acquire the dedicated mutex, because
          * Py_AddPendingCall() can be called without an attached thread state.
          */
-
         PyMutex_Lock(&interp->ceval.pending.mutex);
-        // XXX Why does _PyThreadState_DeleteList() rely on all interpreters
-        // being stopped?
         _PyEval_StopTheWorldAll(interp->runtime);
+
+        HEAD_LOCK(interp->runtime);
         int has_subinterpreters = subinterpreters
                                     ? 
runtime_has_subinterpreters(interp->runtime)
                                     : 0;
+        uintptr_t guards_expected = 0;
         int should_continue = (interp_has_threads(interp)
                               || interp_has_atexit_callbacks(interp)
                               || interp_has_pending_calls(interp)
                               || has_subinterpreters);
+
         if (!should_continue) {
-            break;
+            // We only want to prevent new guards once we're sure that we
+            // won't be running another pre-finalization cycle.
+            if 
(_Py_atomic_compare_exchange_uintptr(&interp->finalization_guards,
+                                                    &guards_expected,
+                                                    
_PyInterpreterGuard_GUARDS_NOT_ALLOWED) == 1) {
+                HEAD_UNLOCK(interp->runtime);
+                break;
+            }
         }
+        HEAD_UNLOCK(interp->runtime);
         _PyEval_StartTheWorldAll(interp->runtime);
         PyMutex_Unlock(&interp->ceval.pending.mutex);
     }
     assert(PyMutex_IsLocked(&interp->ceval.pending.mutex));
+    assert(_Py_atomic_load_uintptr(&interp->finalization_guards) == 
_PyInterpreterGuard_GUARDS_NOT_ALLOWED);
     ASSERT_WORLD_STOPPED(interp);
 }
 
diff --git a/Python/pystate.c b/Python/pystate.c
index 2df24597e65785..bf2616a49148a7 100644
--- a/Python/pystate.c
+++ b/Python/pystate.c
@@ -2889,34 +2889,40 @@ PyGILState_Check(void)
     return (tstate == tcur);
 }
 
+static PyInterpreterGuard *
+get_main_interp_guard(void)
+{
+    PyInterpreterView *view = PyInterpreterView_FromMain();
+    if (view == NULL) {
+        return NULL;
+    }
+
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromView(view);
+    PyInterpreterView_Close(view);
+    return guard;
+}
+
 PyGILState_STATE
 PyGILState_Ensure(void)
 {
-    _PyRuntimeState *runtime = &_PyRuntime;
-
     /* Note that we do not auto-init Python here - apart from
        potential races with 2 threads auto-initializing, pep-311
        spells out other issues.  Embedders are expected to have
        called Py_Initialize(). */
 
-    /* Ensure that _PyEval_InitThreads() and _PyGILState_Init() have been
-       called by Py_Initialize()
-
-       TODO: This isn't thread-safe. There's no protection here against
-       concurrent finalization of the interpreter; it's simply a guard
-       for *after* the interpreter has finalized.
-     */
-    if (!_PyEval_ThreadsInitialized() || 
runtime->gilstate.autoInterpreterState == NULL) {
-        PyThread_hang_thread();
-    }
-
     PyThreadState *tcur = gilstate_get();
     int has_gil;
     if (tcur == NULL) {
         /* Create a new Python thread state for this thread */
-        // XXX Use PyInterpreterState_EnsureThreadState()?
-        tcur = new_threadstate(runtime->gilstate.autoInterpreterState,
-                               _PyThreadState_WHENCE_GILSTATE);
+        PyInterpreterGuard *guard = get_main_interp_guard();
+        if (guard == NULL) {
+            // The main interpreter has finished, so we don't have
+            // any intepreter to make a thread state for. Hang the
+            // thread to act as failure.
+            PyThread_hang_thread();
+        }
+        tcur = new_threadstate(guard->interp,
+                               _PyThreadState_WHENCE_C_API);
         if (tcur == NULL) {
             Py_FatalError("Couldn't create thread-state for new thread");
         }
@@ -2928,6 +2934,7 @@ PyGILState_Ensure(void)
         assert(tcur->gilstate_counter == 1);
         tcur->gilstate_counter = 0;
         has_gil = 0; /* new thread state is never current */
+        PyInterpreterGuard_Close(guard);
     }
     else {
         has_gil = holds_gil(tcur);
@@ -3309,3 +3316,277 @@ _Py_GetMainConfig(void)
     }
     return _PyInterpreterState_GetConfig(interp);
 }
+
+Py_ssize_t
+_PyInterpreterState_GuardCountdown(PyInterpreterState *interp)
+{
+    assert(interp != NULL);
+    Py_ssize_t count = _Py_atomic_load_uintptr(&interp->finalization_guards);
+    assert(count >= 0);
+    return count;
+}
+
+PyInterpreterState *
+_PyInterpreterGuard_GetInterpreter(PyInterpreterGuard *guard)
+{
+    assert(guard != NULL);
+    assert(guard->interp != NULL);
+    return guard->interp;
+}
+
+static int
+try_acquire_interp_guard(PyInterpreterState *interp, PyInterpreterGuard *guard)
+{
+    assert(interp != NULL);
+
+    uintptr_t expected;
+    do {
+        expected = _Py_atomic_load_uintptr(&interp->finalization_guards);
+        if (expected == _PyInterpreterGuard_GUARDS_NOT_ALLOWED) {
+            return -1;
+        }
+    } while (_Py_atomic_compare_exchange_uintptr(&interp->finalization_guards,
+                                                 &expected,
+                                                 expected + 1) == 0);
+    assert(_Py_atomic_load_uintptr(&interp->finalization_guards) > 0);
+    assert(_Py_atomic_load_uintptr(&interp->finalization_guards) != 
_PyInterpreterGuard_GUARDS_NOT_ALLOWED);
+
+    guard->interp = interp;
+    return 0;
+}
+
+PyInterpreterGuard *
+PyInterpreterGuard_FromCurrent(void)
+{
+    PyInterpreterState *interp = _PyInterpreterState_GET();
+    assert(interp != NULL);
+
+    PyInterpreterGuard *guard = PyMem_RawMalloc(sizeof(PyInterpreterGuard));
+    if (guard == NULL) {
+        PyErr_NoMemory();
+        return NULL;
+    }
+
+    if (try_acquire_interp_guard(interp, guard) < 0) {
+        PyMem_RawFree(guard);
+        PyErr_SetString(PyExc_PythonFinalizationError,
+                        "cannot acquire finalization guard anymore");
+        return NULL;
+    }
+
+    return guard;
+}
+
+void
+PyInterpreterGuard_Close(PyInterpreterGuard *guard)
+{
+    PyInterpreterState *interp = guard->interp;
+    assert(interp != NULL);
+
+    assert(_Py_atomic_load_uintptr(&interp->finalization_guards) != 
_PyInterpreterGuard_GUARDS_NOT_ALLOWED);
+    uintptr_t old_value = _Py_atomic_add_uintptr(&interp->finalization_guards, 
-1);
+    if (old_value == 1) {
+        _PyParkingLot_UnparkAll(&interp->finalization_guards);
+    }
+
+    assert(old_value > 0);
+    PyMem_RawFree(guard);
+}
+
+PyInterpreterView *
+PyInterpreterView_FromCurrent(void)
+{
+    PyInterpreterState *interp = _PyInterpreterState_GET();
+    assert(interp != NULL);
+
+    // PyInterpreterView_Close() can be called without an attached thread
+    // state, so we have to use the raw allocator.
+    PyInterpreterView *view = PyMem_RawMalloc(sizeof(PyInterpreterView));
+    if (view == NULL) {
+        PyErr_NoMemory();
+        return NULL;
+    }
+
+    view->id = interp->id;
+    return view;
+}
+
+void
+PyInterpreterView_Close(PyInterpreterView *view)
+{
+    assert(view != NULL);
+    PyMem_RawFree(view);
+}
+
+PyInterpreterGuard *
+PyInterpreterGuard_FromView(PyInterpreterView *view)
+{
+    assert(view != NULL);
+    int64_t interp_id = view->id;
+    assert(interp_id >= 0);
+
+    // This allocation has to happen before we acquire the runtime lock, 
because
+    // PyMem_RawMalloc() might call some weird callback (such as tracemalloc)
+    // that tries to re-entrantly acquire the lock.
+    PyInterpreterGuard *guard = PyMem_RawMalloc(sizeof(PyInterpreterGuard));
+    if (guard == NULL) {
+        return NULL;
+    }
+
+    // Interpreters cannot be deleted while we hold the runtime lock.
+    _PyRuntimeState *runtime = &_PyRuntime;
+    HEAD_LOCK(runtime);
+    PyInterpreterState *interp = interp_look_up_id(runtime, interp_id);
+    if (interp == NULL) {
+        HEAD_UNLOCK(runtime);
+        PyMem_RawFree(guard);
+        return NULL;
+    }
+
+    int result = try_acquire_interp_guard(interp, guard);
+    HEAD_UNLOCK(runtime);
+
+    if (result < 0) {
+        PyMem_RawFree(guard);
+        return NULL;
+    }
+
+    assert(guard->interp != NULL);
+    return guard;
+}
+
+PyInterpreterView *
+PyInterpreterView_FromMain(void)
+{
+    PyInterpreterView *view = PyMem_RawMalloc(sizeof(PyInterpreterView));
+    if (view == NULL) {
+        return NULL;
+    }
+
+    // The main interpreter always has an ID of zero.
+    view->id = 0;
+
+    return view;
+}
+
+static const PyThreadStateToken *_no_tstate_sentinel = (const 
PyThreadStateToken *)&_no_tstate_sentinel;
+#define NO_TSTATE_SENTINEL ((PyThreadStateToken *)_no_tstate_sentinel)
+
+PyThreadStateToken *
+PyThreadState_Ensure(PyInterpreterGuard *guard)
+{
+    assert(guard != NULL);
+    PyInterpreterState *interp = guard->interp;
+    assert(interp != NULL);
+    PyThreadState *attached_tstate = current_fast_get();
+    if (attached_tstate != NULL && attached_tstate->interp == interp) {
+        /* Yay! We already have an attached thread state that matches. */
+        ++attached_tstate->ensure.counter;
+        return attached_tstate;
+    }
+
+    PyThreadState *detached_gilstate = gilstate_get();
+    if (detached_gilstate != NULL && detached_gilstate->interp == interp) {
+        /* There's a detached thread state that works. */
+        assert(attached_tstate == NULL);
+        ++detached_gilstate->ensure.counter;
+        _PyThreadState_Attach(detached_gilstate);
+        return NO_TSTATE_SENTINEL;
+    }
+
+    PyThreadState *fresh_tstate = _PyThreadState_NewBound(interp,
+                                                          
_PyThreadState_WHENCE_C_API);
+    if (fresh_tstate == NULL) {
+        return NULL;
+    }
+    fresh_tstate->ensure.counter = 1;
+    fresh_tstate->ensure.delete_on_release = 1;
+
+    if (attached_tstate != NULL) {
+        return (PyThreadStateToken *)PyThreadState_Swap(fresh_tstate);
+    }
+
+    _PyThreadState_Attach(fresh_tstate);
+    return NO_TSTATE_SENTINEL;
+}
+
+PyThreadStateToken *
+PyThreadState_EnsureFromView(PyInterpreterView *view)
+{
+    assert(view != NULL);
+    PyInterpreterGuard *guard = PyInterpreterGuard_FromView(view);
+    if (guard == NULL) {
+        return NULL;
+    }
+
+    PyThreadStateToken *result = (PyThreadStateToken 
*)PyThreadState_Ensure(guard);
+    if (result == NULL) {
+        PyInterpreterGuard_Close(guard);
+        return NULL;
+    }
+
+    PyThreadState *tstate = current_fast_get();
+    assert(tstate != NULL);
+
+    if (tstate->ensure.owned_guard != NULL) {
+        assert(tstate->ensure.owned_guard->interp == guard->interp);
+        PyInterpreterGuard_Close(guard);
+    }
+    else {
+        assert(tstate->ensure.owned_guard == NULL);
+        tstate->ensure.owned_guard = guard;
+    }
+
+    return result;
+}
+
+void
+PyThreadState_Release(PyThreadStateToken *token)
+{
+    PyThreadState *tstate = current_fast_get();
+    _Py_EnsureTstateNotNULL(tstate);
+    Py_ssize_t remaining = --tstate->ensure.counter;
+    if (remaining < 0) {
+        Py_FatalError("PyThreadState_Release() called more times than 
PyThreadState_Ensure()");
+    }
+
+    if (remaining != 0) {
+        // If the corresponding PyThreadState_Ensure() call used a detached
+        // thread state, we want to detach it again.
+        if (token == NO_TSTATE_SENTINEL) {
+            PyThreadState_Swap(NULL);
+        }
+        return;
+    }
+
+    PyThreadState *to_restore;
+    if (token == NO_TSTATE_SENTINEL) {
+        to_restore = NULL;
+    }
+    else {
+        to_restore = (PyThreadState *)token;
+    }
+
+    PyInterpreterGuard *owned_guard = tstate->ensure.owned_guard;
+    assert(tstate->ensure.delete_on_release == 1 || 
tstate->ensure.delete_on_release == 0);
+    if (tstate->ensure.delete_on_release) {
+        ++tstate->ensure.counter;
+        PyThreadState_Clear(tstate);
+        --tstate->ensure.counter;
+    }
+    else if (owned_guard != NULL) {
+        tstate->ensure.owned_guard = NULL;
+    }
+
+    PyThreadState *check_tstate = PyThreadState_Swap(to_restore);
+    (void)check_tstate;
+    assert(check_tstate == tstate);
+
+    if (tstate->ensure.delete_on_release) {
+        PyThreadState_Delete(tstate);
+    }
+
+    if (owned_guard != NULL) {
+        PyInterpreterGuard_Close(owned_guard);
+    }
+}
diff --git a/Tools/c-analyzer/cpython/ignored.tsv 
b/Tools/c-analyzer/cpython/ignored.tsv
index 11d58460b3975d..7af64ed017ba73 100644
--- a/Tools/c-analyzer/cpython/ignored.tsv
+++ b/Tools/c-analyzer/cpython/ignored.tsv
@@ -785,3 +785,4 @@ Objects/dictobject.c        -       PyFrozenDict_Type       
-
 
 ## False positives
 Python/specialize.c    -       _Py_InitCleanup -
+Python/pystate.c       -       _no_tstate_sentinel     -

_______________________________________________
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

Reply via email to