Author: Armin Rigo <ar...@tunes.org>
Branch: gc-del-3
Changeset: r84102:0cebe4cdc049
Date: 2016-05-01 16:03 +0200
http://bitbucket.org/pypy/pypy/changeset/0cebe4cdc049/

Log:    Update docs with the goal

diff --git a/pypy/doc/discussion/finalizer-order.rst 
b/pypy/doc/discussion/finalizer-order.rst
--- a/pypy/doc/discussion/finalizer-order.rst
+++ b/pypy/doc/discussion/finalizer-order.rst
@@ -1,19 +1,118 @@
-.. XXX armin, what do we do with this?
+Ordering finalizers in the MiniMark GC
+======================================
 
 
-Ordering finalizers in the SemiSpace GC
-=======================================
+RPython interface
+-----------------
 
-Goal
-----
+In RPython programs like PyPy, we need a fine-grained method of
+controlling the RPython- as well as the app-level ``__del__()``.  To
+make it possible, the RPython interface is now the following one (from
+May 2016):
 
-After a collection, the SemiSpace GC should call the finalizers on
+* RPython objects can have ``__del__()``.  These are called
+  immediately by the GC when the last reference to the object goes
+  away, like in CPython.  However (like "lightweight finalizers" used
+  to be), all ``__del__()`` methods must only contain simple enough
+  code, and this is checked.  We call this "destructors".  They can't
+  use operations that would resurrect the object, for example.
+
+* For any more advanced usage --- in particular for any app-level
+  object with a __del__ --- we don't use the RPython-level
+  ``__del__()`` method.  Instead we use
+  ``rgc.FinalizerController.register_finalizer()``.  This allows us to
+  attach a finalizer method to the object, giving more control over
+  the ordering than just an RPython ``__del__()``.
+
+We try to consistently call ``__del__()`` a destructor, to distinguish
+it from a finalizer.  A finalizer runs earlier, and in topological
+order; care must be taken that the object might still be reachable at
+this point if we're clever enough.  A destructor on the other hand runs
+last; nothing can be done with the object any more.
+
+
+Destructors
+-----------
+
+A destructor is an RPython ``__del__()`` method that is called directly
+by the GC when there is no more reference to an object.  Intended for
+objects that just need to free a block of raw memory or close a file.
+
+There are restrictions on the kind of code you can put in ``__del__()``,
+including all other functions called by it.  These restrictions are
+checked.  In particular you cannot access fields containing GC objects;
+and if you call an external C function, it must be a "safe" function
+(e.g. not releasing the GIL; use ``releasegil=False`` in
+``rffi.llexternal()``).
+
+If there are several objects with destructors that die during the same
+GC cycle, they are called in a completely random order --- but that
+should not matter because destructors cannot do much anyway.
+
+
+Register_finalizer
+------------------
+
+The interface for full finalizers is made with PyPy in mind, but should
+be generally useful.
+
+The idea is that you subclass the ``rgc.FinalizerController`` class::
+
+* You must give a class-level attribute ``base_class``, which is the
+  base class of all instances with a finalizer.  (If you need
+  finalizers on several unrelated classes, you need several unrelated
+  ``FinalizerController`` subclasses.)
+
+* You override the ``finalizer_trigger()`` method; see below.
+
+Then you create one global (or space-specific) instance of this
+subclass; call it ``fin``.  At runtime, you call
+``fin.register_finalizer(obj)`` for every instance ``obj`` that needs
+a finalizer.  Each ``obj`` must be an instance of ``fin.base_class``,
+but not every such instance needs to have a finalizer registered;
+typically we try to register a finalizer on as few objects as possible
+(e.g. only if it is an object which has an app-level ``__del__()``
+method).
+
+After a major collection, the GC finds all objects ``obj`` on which a
+finalizer was registered and which are unreachable, and mark them as
+reachable again, as well as all objects they depend on.  It then picks
+a topological ordering (breaking cycles randomly, if any) and enqueues
+the objects and their registered finalizer functions in that order, in
+a queue specific to the prebuilt ``fin`` instance.  Finally, when the
+major collection is done, it calls ``fin.finalizer_trigger()``.
+
+This method ``finalizer_trigger()`` can either do some work directly,
+or delay it to be done later (e.g. between two bytecodes).  If it does
+work directly, note that it cannot (directly or indirectly) cause the
+GIL to be released.
+
+To find the queued items, call ``fin.next_dead()`` repeatedly.  It
+returns the next queued item, or ``None`` when the queue is empty.
+
+It is not allowed to cumulate several ``FinalizerController``
+instances for objects of the same class.  Calling
+``fin.register_finalizer(obj)`` several times for the same ``obj`` is
+fine (and will only register it once).
+
+
+Ordering of finalizers
+----------------------
+
+After a collection, the MiniMark GC should call the finalizers on
 *some* of the objects that have one and that have become unreachable.
 Basically, if there is a reference chain from an object a to an object b
 then it should not call the finalizer for b immediately, but just keep b
 alive and try again to call its finalizer after the next collection.
 
-This basic idea fails when there are cycles.  It's not a good idea to
+(Note that this creates rare but annoying issues as soon as the program
+creates chains of objects with finalizers more quickly than the rate at
+which major collections go (which is very slow).  In August 2013 we tried
+instead to call all finalizers of all objects found unreachable at a major
+collection.  That branch, ``gc-del``, was never merged.  It is still
+unclear what the real consequences would be on programs in the wild.)
+
+The basic idea fails in the presence of cycles.  It's not a good idea to
 keep the objects alive forever or to never call any of the finalizers.
 The model we came up with is that in this case, we could just call the
 finalizer of one of the objects in the cycle -- but only, of course, if
@@ -33,6 +132,7 @@
         detach the finalizer (so that it's not called more than once)
         call the finalizer
 
+
 Algorithm
 ---------
 
@@ -136,28 +236,8 @@
 that doesn't change the state of an object, we don't follow its children
 recursively.
 
-In practice, in the SemiSpace, Generation and Hybrid GCs, we can encode
-the 4 states with a single extra bit in the header:
-
-      =====  =============  ========  ====================
-      state  is_forwarded?  bit set?  bit set in the copy?
-      =====  =============  ========  ====================
-        0      no             no        n/a
-        1      no             yes       n/a
-        2      yes            yes       yes
-        3      yes          whatever    no
-      =====  =============  ========  ====================
-
-So the loop above that does the transition from state 1 to state 2 is
-really just a copy(x) followed by scan_copied().  We must also clear the
-bit in the copy at the end, to clean up before the next collection
-(which means recursively bumping the state from 2 to 3 in the final
-loop).
-
-In the MiniMark GC, the objects don't move (apart from when they are
-copied out of the nursery), but we use the flag GCFLAG_VISITED to mark
-objects that survive, so we can also have a single extra bit for
-finalizers:
+In practice, in the MiniMark GCs, we can encode
+the 4 states with a combination of two bits in the header:
 
       =====  ==============  ============================
       state  GCFLAG_VISITED  GCFLAG_FINALIZATION_ORDERING
@@ -167,3 +247,8 @@
         2        yes             yes
         3        yes             no
       =====  ==============  ============================
+
+So the loop above that does the transition from state 1 to state 2 is
+really just a recursive visit.  We must also clear the
+FINALIZATION_ORDERING bit at the end (state 2 to state 3) to clean up
+before the next collection.
diff --git a/rpython/doc/rpython.rst b/rpython/doc/rpython.rst
--- a/rpython/doc/rpython.rst
+++ b/rpython/doc/rpython.rst
@@ -191,6 +191,12 @@
   ``__setitem__`` for slicing isn't supported. Additionally, using negative
   indices for slicing is still not support, even when using ``__getslice__``.
 
+  Note that from May 2016 the destructor ``__del__`` must only contain
+  `simple operations`__; for any kind of more complex destructor, see
+  ``rpython.rlib.rgc.register_finalizer()``.
+
+.. __: garbage_collection.html
+
 This layout makes the number of types to take care about quite limited.
 
 
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to