Author: Armin Rigo <ar...@tunes.org>
Branch: 
Changeset: r52:45987a0fee0b
Date: 2013-05-29 19:48 +0200
http://bitbucket.org/pypy/stmgc/changeset/45987a0fee0b/

Log:    Starting a rewrite of doc-stmgc.txt, very high-level so far, with
        changes that describe the new implementation plan.

diff --git a/c3/doc-stmgc.txt b/c3/doc-stmgc.txt
--- a/c3/doc-stmgc.txt
+++ b/c3/doc-stmgc.txt
@@ -2,23 +2,98 @@
 Details of the interactions between STM and the GC
 ==================================================
 
-Below, an "object" means really one version of an object, as allocated
-in memory.  What is from the higher level the "same" object may exist in
-several versions (we say "higher level object" to mean this).  Usually
-these versions are objects in a linked list, chained together with the
-h_revision field in the objects' headers.
+In this document we say "high-level object" to mean an object from the
+point of the user of the library, as opposed to an "object copy", which
+occupies the space of one allocated piece of memory.  One high-level
+object can exist in several copies simultaneously.  This concept of
+"copy" should not be confused with a "revision", which stands for a
+globally consistent copy of all objects.  One revision is the result of
+one transaction.  A program usually has one revision per thread in
+progress, plus any number of older committed revisions.  The committed
+revisions are globally ordered.
 
-Each object is either young or old.  All new objects are allocated
-young.  They become old at the next minor collection.  In the common
-case, objects are allocated in the nursery, and during the next minor
-collection, they are moved outside (if they survive).  The nursery
-contains only young objects, but a few objects outside might be young
-too (e.g. objects too large for the nursery).  (In fact we found out in
-PyPy that it's a good idea to create objects young even if they are
-outside the nursery; otherwise, a program that creates a lot of
-medium-sized objects will quickly exhaust the memory and trigger a lot
-of major collections.)  For the rest of this document we'll ignore young
-objects outside the nursery.
+The object copies exist in one of three main states: they can be
+"private", "protected" or "public".  A copy is private when it belongs
+to the transaction in progress.  When that transaction commits, it
+becomes protected, and remains so as long as it is accessed only by the
+same thread.  A copy becomes public only when another thread requests
+access to it (or, more precisely, "steals" access to it).  Once public,
+a copy is immutable in memory.
+
+From the point of view of the generational GC, each copy is either young
+or old.  All new copies are allocated young.  They become old at the
+next minor collection.  In the common case, copies are allocated in the
+nursery, and during the next minor collection, if they survive, they are
+moved outside.  The nursery contains only young copies, but a few copies
+outside might be young too (e.g. copies of objects too large for the
+nursery).  (In fact we found out in PyPy that it's a good idea to create
+objects young even if they are outside the nursery; otherwise, a program
+that creates a lot of medium-sized objects will quickly exhaust the
+memory and trigger a lot of major collections.)  For the rest of this
+document we'll ignore young copies outside the nursery.
+
+An object that was never seen by a different thread has got at most two
+copies: the first, protected, is the copy at the latest committed
+revision; and the other, private, is the current copy.  If the
+transaction aborts we can forget the private copy and reuse the previous
+protected copy.  If the transaction commits we forget the previous
+protected copy instead; then at this point all private objects become
+protected.  If the object is modified again in the near future, we reuse
+the memory that was occupied by the previous copy to store the next
+private copy.  As a result, each of these two spaces in memory can be
+young or old.  When the GC runs, if any one of these two copies is
+young, only the other copy is kept.  Similarly during major collections,
+only one copy is kept.  So objects no longer modified will eventually
+consume only one space.
+
+The way to share data between threads goes via prebuilt objects, which
+are always public: it is their existence that gives the starting point
+for threads to see each other's objects.  This involves three different
+steps.
+
+1. A thread tries to write to a public object.  This is done by
+allocating a fresh private copy of the public object.  Then writes go to
+the private copy.  If the transaction commits, the private copy becomes
+protected, and the public object is made to point to it (with
+multithread care).  From now on, any access to the public object from
+the same thread will work on the protected object or its future private
+copy.  Any access from a different thread will trigger "stealing", as
+explained next.
+
+2. A thread tries to access a public object but finds that another
+thread has committed changes to it (hereafter called the "foreign
+thread").  Then we "steal" the object.  It is a read-only operation
+performed by peeking on the foreign thread's data.  The operation
+involves making a duplicate of the original copy if it was in the
+foreign thread's nursery, so that no thread ever reads another thread's
+nursery outside of "stealing".  The stolen copy, or the original
+protected copy if it was not young, is then marked as public.  From now
+on nobody is allocated to change the content of this copy, and it
+becomes the current public copy.
+
+3. A subtle but important point about making a public copy is about all
+references stored in the object: if they point to other protected
+objects, then we cannot simply keep them as they are in the public copy.
+In that case, we have to replace these references with pointers to
+public "stubs".  A stub consists of only the header of the object.  It
+is set up in the same way as in point 1 above: it plays the role of an
+"older" public copy of a protected object (although it is not actually
+older of course).  If "we", the thread that just stole the object, then
+try to follow one of the references, we will access one of these stubs,
+and go back to point 1: we will need to steal it again.
+
+
+
+
+
+
+
+------------
+
+
+
+
+
 
 Independently, each object can be private or non-private (we used to say
 local or global).  The private objects are the ones belonging to the
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
http://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to