Most of the work toward interpreter isolation and a per-interpreter
GIL involves moving static global variables to _PyRuntimeState or
PyInterpreterState (or module state).  Through the effort of quite a
few people, we've made good progress.  However, many globals still
remain, with the majority being objects and most of those being static
strings (e.g. _Py_Identifier), static types (incl. exceptions), and
singletons.

On top of that, a number of those objects are exposed in the public
C-API and even in the limited API. :(  Dealing with this specifically
is probably the trickiest thing I've had to work through in this
project.

There is one solution that would help both of the above in a nice way:
"immortal" objects.

The idea of objects that never get deallocated isn't new and has been
explored here several times.  Not that long ago I tried it out by
setting the refcount really high.  That worked.  Around the same time
Eddie Elizondo at Facebook did something similar but modified
Py_INCREF() and Py_DECREF() to keep the refcount from changing.  Our
solutions were similar but with different goals in mind.  (Facebook
wants to avoid copy-on-write in their pre-fork model.)

A while back I concluded that neither approach would work for us.  The
approach I had taken would have significant cache performance
penalties in a per-interpreter GIL world.  The approach that modifies
Py_INCREF() has a significant performance penalty due to the extra
branch on such a frequent operation.

Recently I've come back to the idea of immortal objects because it's
much simpler than the alternate (working) solution I found.  So how do
we get around that performance penalty?  Let's say it makes CPython 5%
slower.  We have some options:

* live with the full penalty
* make other changes to reduce the penalty to a more acceptable
threshold than 5%
* eliminate the penalty (e.g. claw back 5% elsewhere)
* abandon all hope

Mark Shannon suggested to me some things we can do.  Also, from a
recent conversation with Dino Viehland it sounds like Eddie was able
to reach performance-neutral with a few techniques.  So here are some
things we can do to reduce or eliminate that penalty:

* reduce refcount operations on high-activity objects (e.g. None, True, False)
* reduce refcount operations in general
* walk the heap at the end of runtime initialization and mark all
objects as immortal
* mark all global objects as immortal (statics or in _PyRuntimeState;
for PyInterpreterState not needed)

What do you think?  Does this sound realistic?  Are there additional
things we can do to counter that penalty?

-eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7O3FUA52QGTVDC6MDAV5WXKNFEDRK5D6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to