STINNER Victor <vstin...@python.org> added the comment:

One GIL per interpreter requires to store the tstate per thread. I don't see 
any other option. We need to replace the global _PyRuntime atomic variable with 
a TLS variable. I'm trying to reduce the overhead, but it's heard to beat the 
performance of an atomic variable.

That's also we I modified many functions to pass explicitly tstate to 
subfunctions in internal C functions, to avoid any possible overhead of getting 
tstate.

https://vstinner.github.io/cpython-pass-tstate.html


Pablo:
> In MacOS is quite challenging to activate LTO, so normally optimized builds 
> are only done with PGO.

Oh right, I forgot macOS. I should check how TLS is compiled on macOS. IMO wwo 
MOV instead of MOV is not a major performance bottleneck.

The best would be to be able to avoid pthread_getspecific() function which is 
less efficient than a TLS variable. The glibc implementation uses an array for 
a few variables (first 32 variables?) and then a slower hash table.


Pablo:
> Also in Windows I am not sure is possible to use LTO. Same for many other 
> platforms.

I will check how it's implemented on Windows.

We cannot use TLS on all platforms, since it requires C11 features which are 
not available on all platforms. Also, the implementation depends on the 
architecture.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40522>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to