As you know, I'm skeptical that PEP 554 will produce benefits that are worth the effort, but let's assume for the moment that it is, and we're all 100% committed to moving all globals into the threadstate. Even given that, the motivation for this change seems a bit unclear to me.
I guess the possible goals are: - Get rid of the "ambient" threadstate entirely - Make accessing the threadstate faster For the first goal, I don't think this is possible, or desirable. Obviously if we remove the GIL somehow then at a minimum we'll need to make the global threadstate a thread-local. But I think we'll always have to keep it around as a thread-local, at least, because there are situations where you simply cannot pass in the threadstate as an argument. One example comes up when doing FFI: there are C libraries that take callbacks, and will run them later in some arbitrary thread. When wrapping these in Python, we need a way to bundle up a Python function into a C function that can be called from any thread. So, ctypes and cffi and cython all have ways to do this bundling, and they all start with some delicate dance to figure out whether or not the current thread holds the GIL, acquiring the GIL if not, then checking whether or not this thread has a Python threadstate assigned, creating it if not, etc. This is completely dependent on having the threadstate available in ambient context. If threadstates were always passed as arguments, then it would become impossible to wrap these C libraries. So we can't do that. That said, it's fine – even if we do remove the GIL, we still won't have a *single OS thread* executing code from two different interpreters at the same time! So storing the threadstate in a thread-local is fine, and we can keep the ability to grab the threadstate at any moment, regardless of whether it was passed as an argument. But that means the only reason for passing the threadstate around as an argument is if it's faster than looking it up. And AFAICT, no-one in this thread actually knows if that's true? You mentioned that there's an "atomic operation" there currently, but I think on x86 at least _Py_atomic_load_relaxed is literally a no-op. Larry did some experiments with the old pthreads thread-local storage API, but no-one seems to have done any measurements on the new, much-faster thread-local storage API, and no-one's done any measurements of the cost of passing around threadstates explicitly. For all we know, passing the threadstate around is actually slower than looking it up every time. And we don't even know yet whether the threadstate even will move into thread-local storage. It seems a bit weird to start doing massive internal refactoring before measuring those things. -n On Tue, Nov 12, 2019 at 2:03 PM Victor Stinner <vstin...@python.org> wrote: > > Hi, > > Are you ok to modify internal C functions to pass explicitly tstate? > > -- > > I started to modify internal C functions to pass explicitly "tstate" > when calling C functions: the Python thread state (PyThreadState). > Example of C code (after my changes): > > if (_Py_EnterRecursiveCall(tstate, " while calling a Python object")) > { > return NULL; > } > PyObject *result = (*call)(callable, args, kwargs); > _Py_LeaveRecursiveCall(tstate); > return _Py_CheckFunctionResult(tstate, callable, result, NULL); > > In Python 3.8, the tstate is implicit: > > if (Py_EnterRecursiveCall(" while calling a Python object")) { > return NULL; > } > PyObject *result = (*call)(callable, args, kwargs); > Py_LeaveRecursiveCall(); > return _Py_CheckFunctionResult(callable, result, NULL); > > There are different reasons to pass explicitly tstate, but my main > motivation is to rework Python code base to move away from implicit > global states to states passed explicitly, to implement the PEP 554 > "Multiple Interpreters in the Stdlib". In short, the final goal is to > run multiple isolated Python interpreters in the same process: run > pure Python code on multiple CPUs in parallel with a single process > (whereas multiprocessing runs multiple processes). > > Currently, subinterpreters are a hack: they still share a lot of > things, the code base is not ready to implement isolated interpreters > with one "GIL" (interpreter lock) per interpreter, and to run multiple > interpreters in parallel. Many _PyRuntimeState fields (the global > _PyRuntime variable) should be moved to PyInterpreterState (or maybe > PyThreadState): per interpreter. > > Another simpler but more annoying example are Py_None and Py_True > singletons which are globals. We cannot share these singletons between > interpreters because updating their reference counter would be a > performance bottleneck. If we put a "superglobal-GIL" to ensure that > Py_None reference counter remains consistent, it would basically > "serialize" all threads, rather than running them in parallel. > > The idea of passing tstate to internal C functions is to prepare code > to get the per-interpreter None from tstate. > > tstate is basically the "root" to access all states which are per > interpreter. For example, PyInterpreterState can be read from > tstate->interp. > > Right now, tstate is only passed to a few functions, but you should > expect to see it passed to way more functions later, once more > structures will be moved to PyInterpreterState. > > -- > > On my latest merged PR 17052 ("Add _PyObject_VectorcallTstate()"), > Mark Shannon wrote: "I don't see how this could ever be faster, nor do > I see how it is more correct." > https://github.com/python/cpython/pull/17052#issuecomment-552538438 > > Currently, tstate is get using these internal APIs: > > #define _PyRuntimeState_GetThreadState(runtime) \ > > ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current)) > #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime) > > or using public APIs: > > PyAPI_FUNC(PyThreadState *) PyThreadState_Get(void); > #define PyThreadState_GET() PyThreadState_Get() > > I dislike _PyThreadState_GET() for 2 reasons: > > * it relies on the _PyRuntime global variable: I would prefer to avoid > global variables > * it uses an atomic operation which can become a perofrmance issue > when more and more code will require tstate > > -- > > An alternative would be to use PyGILState_GetThisThreadState() which > uses a thread local state (TLS) variable to get the Python thread > state ("tstate"), rather that _PyRuntime atomic variable. Except that > the PyGILState API doesn't support subinterpreters yet :-( > > https://bugs.python.org/issue15751 "Support subinterpreters in the GIL > state API" is open since 2012. > > Note: While the GIL is released, _PyThreadState_GET() is NULL, whereas > PyGILState_GetThisThreadState() is non-NULL. > > -- > > Links: > > * https://pythoncapi.readthedocs.io/runtime.html : my notes on moving > globals to per interpreter states > * https://bugs.python.org/issue36710 > * https://bugs.python.org/issue38644 > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/ > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2AIGZD45Q7ZTTQYJZ6PP6XWK3JVDMZUV/ Code of Conduct: http://python.org/psf/codeofconduct/