On 2019-11-12 23:03, Victor Stinner wrote:
Hi,
Are you ok to modify internal C functions to pass explicitly tstate?
In short, yes, but:
- don't make things slower :)
- don't break the public API or the stable ABI
I'm a fan of explicitly passing state everywhere, rather than keeping it
in "global" variables.
Currently, surprisingly many internal functions do a PyThreadState_GET
for themselves, then call another function that does the same. That's
wasteful, but impossible to change in the public API.
Your changes (of which I only saw a very limited subset) seem to follow
a simple rule: public API functions call PyThreadState_GET, and then
call internal functions that pass it around.
That's sounds beautifully easy to explain! Later, we'll just need to
find a way to make the tstate API public (and opt-in).
The "per-interpreter None", however, is a different issue. I don't see
how that can be done without breaking the stable ABI. I still think
immortal immutable objects could be shared across interpreters.
--
I started to modify internal C functions to pass explicitly "tstate"
when calling C functions: the Python thread state (PyThreadState).
Example of C code (after my changes):
if (_Py_EnterRecursiveCall(tstate, " while calling a Python object")) {
return NULL;
}
PyObject *result = (*call)(callable, args, kwargs);
_Py_LeaveRecursiveCall(tstate);
return _Py_CheckFunctionResult(tstate, callable, result, NULL);
In Python 3.8, the tstate is implicit:
if (Py_EnterRecursiveCall(" while calling a Python object")) {
return NULL;
}
PyObject *result = (*call)(callable, args, kwargs);
Py_LeaveRecursiveCall();
return _Py_CheckFunctionResult(callable, result, NULL);
There are different reasons to pass explicitly tstate, but my main
motivation is to rework Python code base to move away from implicit
global states to states passed explicitly, to implement the PEP 554
"Multiple Interpreters in the Stdlib". In short, the final goal is to
run multiple isolated Python interpreters in the same process: run
pure Python code on multiple CPUs in parallel with a single process
(whereas multiprocessing runs multiple processes).
Currently, subinterpreters are a hack: they still share a lot of
things, the code base is not ready to implement isolated interpreters
with one "GIL" (interpreter lock) per interpreter, and to run multiple
interpreters in parallel. Many _PyRuntimeState fields (the global
_PyRuntime variable) should be moved to PyInterpreterState (or maybe
PyThreadState): per interpreter.
Another simpler but more annoying example are Py_None and Py_True
singletons which are globals. We cannot share these singletons between
interpreters because updating their reference counter would be a
performance bottleneck. If we put a "superglobal-GIL" to ensure that
Py_None reference counter remains consistent, it would basically
"serialize" all threads, rather than running them in parallel.
The idea of passing tstate to internal C functions is to prepare code
to get the per-interpreter None from tstate.
tstate is basically the "root" to access all states which are per
interpreter. For example, PyInterpreterState can be read from
tstate->interp.
Right now, tstate is only passed to a few functions, but you should
expect to see it passed to way more functions later, once more
structures will be moved to PyInterpreterState.
--
On my latest merged PR 17052 ("Add _PyObject_VectorcallTstate()"),
Mark Shannon wrote: "I don't see how this could ever be faster, nor do
I see how it is more correct."
https://github.com/python/cpython/pull/17052#issuecomment-552538438
Currently, tstate is get using these internal APIs:
#define _PyRuntimeState_GetThreadState(runtime) \
((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))
#define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)
or using public APIs:
PyAPI_FUNC(PyThreadState *) PyThreadState_Get(void);
#define PyThreadState_GET() PyThreadState_Get()
I dislike _PyThreadState_GET() for 2 reasons:
* it relies on the _PyRuntime global variable: I would prefer to avoid
global variables
* it uses an atomic operation which can become a perofrmance issue
when more and more code will require tstate
--
An alternative would be to use PyGILState_GetThisThreadState() which
uses a thread local state (TLS) variable to get the Python thread
state ("tstate"), rather that _PyRuntime atomic variable. Except that
the PyGILState API doesn't support subinterpreters yet :-(
https://bugs.python.org/issue15751 "Support subinterpreters in the GIL
state API" is open since 2012.
Note: While the GIL is released, _PyThreadState_GET() is NULL, whereas
PyGILState_GetThisThreadState() is non-NULL.
--
Links:
* https://pythoncapi.readthedocs.io/runtime.html : my notes on moving
globals to per interpreter states
* https://bugs.python.org/issue36710
* https://bugs.python.org/issue38644
Victor
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/RAVSH7HYHTROXSTUR3677WGTCTEO6FYF/
Code of Conduct: http://python.org/psf/codeofconduct/