Author: Armin Rigo <ar...@tunes.org>
Branch: 
Changeset: r61:82b9992a758c
Date: 2013-06-03 15:57 +0200
http://bitbucket.org/pypy/stmgc/changeset/82b9992a758c/

Log:    In-progress

diff --git a/c3/doc-objects.txt b/c3/doc-objects.txt
new file mode 100644
--- /dev/null
+++ b/c3/doc-objects.txt
@@ -0,0 +1,76 @@
+
+
+
+Object copies state transitions (changes of state of the *same* copy)
+---------------------------------------------------------------------
+
+
+
+  Private freshly created
+             \                Private, with backup
+              \                  ^       \    ^
+               \                /         \   |
+         commit \       modify /    commit |  |
+                 \            /            |  | modify
+                  V          /             |  |
+           Protected, no backup            V  |
+                ^    ^              Protected, with backup
+               /     |       gc       |
+       commit /      `----------------'
+             /
+            /
+    Private copy of
+    a public obj
+
+
+
+     Protected backup copy
+                  \
+                   \
+           stealing \         commit of newer version
+                     \         ,-----------------.
+                      V        |                 V
+             Up-to-date public copy       Outdated public copy
+
+
+
+Kind of object copy                                    h_revision
+-------------------------------------------------------------------
+
+Private objects:
+- freshly created                                         PRN
+- converted from a protected obj                          PRN
+- private copy of a public obj                            PRN
+
+Protected objects:
+- converted from fresh private obj                     (old PRN)
+- converted from a private obj with backup           ptr to backup
+- backup copy of a private obj                    original h_revision
+- backup copy still attached to a protected               GT
+- original obj after GC killed the backup                 GT
+
+Public objects:
+- prebuilt object, never modified                         -1
+- other public object, never modified                     GT
+- outdated, has a protected copy               ptr to prot/priv | 2
+- outdated, target stolen                   ptr to next public copy
+
+Public stubs:
+- from stealing: like outdated public objects
+- from major GC: like outdated public objects with target stolen
+
+
+PRN = Private revision number (negative odd number)
+GT = Global time (positive odd number)
+
+
+
+Off-line data stored in the thread-local structure
+--------------------------------------------------
+
+- the PRN (private revision number): odd, negative, changes for every
+  transaction that commits
+
+- list of pairs (private converted from protected, backup copy)
+
+- dict {public obj: private copy}
diff --git a/c3/doc-stmgc.txt b/c3/doc-stmgc.txt
--- a/c3/doc-stmgc.txt
+++ b/c3/doc-stmgc.txt
@@ -12,24 +12,26 @@
 from multiple threads is possible, and handled correctly (that's the
 whole point), but a relatively rare case.
 
-So each object is classified as "public" or "protected".  New objects
-are protected until they are read by a different thread.  The point is
-to use very different mechanisms for public and for protected objects.
-Public objects are visible by all threads, but read-only in memory; to
-change them, a copy must be made, and the changes written to the copy
-(the "redolog" approach to STM).  Protected objects, on the other hand,
-are modified in-place, with (if necessary) a copy of them being made
-only for the purpose of a possible abort of the transaction (the
-"undolog" approach).
+So each object is classified as "public", "protected", or "private".
+Objects are created private, and later become protected, and stay so as
+long as they are not read by a different thread.  The point is to use
+very different mechanisms for public and for non-public objects.  Public
+objects are visible by all threads, but read-only in memory; to change
+them, a copy must be made, and the changes written to the copy (the
+"redolog" approach to STM).  Non-public objects, on the other hand, are
+modified in-place, with (if necessary) a copy of them being made only
+for the purpose of a possible abort of the transaction (the "undolog"
+approach).
 
 This is combined with a generational GC similar to PyPy's --- but here,
 each thread gets its own nursery and does its own "minor collections",
 independently of the others.
 
-Objects start as protected, and when another thread tries to follow a
-pointer to them, then it is that other thread's job to carefully "steal"
-the object and turn it public (possibly making a copy of it if needed,
-e.g. if it was still a young object living in the original nursery).
+The idea of protected objects is that when another thread tries to
+follow a pointer to them, then it is that other thread's job to
+carefully "steal" the object and turn it public (possibly making a copy
+of it if needed, e.g. if it was still a young object living in the
+original nursery).
 
 The same object can exist temporarily in multiple versions: any number
 of public copies; at most one active protected copy; and optionally one
@@ -62,14 +64,14 @@
 committed revisions are globally ordered.  This is the order that the
 multithreaded program appears to have followed serially.
 
-The object copies exist in one of two main states: they can be
-"protected" or "public".  A copy is also called "private" when it was
-modified by the transaction in progress; this copy is always protected
-and invisible to other threads.  When that transaction commits, all
-private copies become protected, and remain so as long as it is accessed
-only by the same thread.  A copy becomes public only when another thread
-requests access to it (or, more precisely, "steals" access to it).  Once
-public, a copy is immutable in memory.
+The object copies exist in one of three main states: they can be
+"public", "protected" or "private".  A private copy is a copy that was
+created or modified by the transaction in progress; this copy is always
+invisible to other threads.  When that transaction commits, all private
+copies become protected.  They remain protected as long as they are only
+accessed by the same thread.  A copy becomes public only when another
+thread requests access to it (or, more precisely, "steals" access to
+it).  Once public, a copy is immutable in memory.
 
 From the point of view of the generational GC, each copy is either young
 or old.  All new copies are allocated young.  They become old at the
@@ -83,17 +85,15 @@
 memory and trigger a lot of major collections.)  For the rest of this
 document we'll ignore young copies outside the nursery.
 
-An object that was never seen by a different thread has got either one
-of two copies, both protected: the "main" one, used by the thread, which
-may be private or not depending on whether the object was modified in
-the current transaction; and, if the object is private but older than
-the current transaction, then it has got a backup copy whose purpose is
-to record the state that the object had at the start of the current
-transaction.
+An object that was never seen by a different thread has got at most one
+private copy (if it was created or modified by the current transaction)
+and one protected copy (if it is older than the current transaction).
+If it has two copies, then the private one is the regular copy, and the
+other copy works as a backup copy that remembers the state that the
+object had at the start of the current transaction.
 
-If an object is committed and then no longer modified for long enough,
-the next (minor or major) GC will free the space that was used by the
-backup copy.
+If an object has got a backup copy but isn't modified any more, the next
+(minor or major) GC collection will free the backup copy.
 
 The way to share data between threads goes via prebuilt objects, which
 are always public: it is their existence that gives the starting point
@@ -103,7 +103,7 @@
 1. A thread tries to write to a public object.  This is done by
 allocating a fresh private copy of the public object.  Then writes go to
 the private copy.  If the transaction commits, the private copy becomes
-simply protected, and the public object is made to point to it (with
+protected, and the public object is made to point to it (with
 multithread care).  From now on, any access to the public object from
 the same thread will work on the protected copy.  Any access from a
 different thread will trigger "stealing", as explained next.
@@ -141,9 +141,8 @@
 -------------------
 
 This design is made to optimize the hopefully common case: objects we
-handle are mostly private or protected, or if they are public, they are
-mostly read-only.  We can design in consequence the following three
-points:
+handle are mostly protected, or if they are public, they are mostly
+read-only.  We can design in consequence the following three points:
 
 1. the extra data stored in the objects (GC flags, and one extra word
 called `h_revision`).
@@ -162,12 +161,12 @@
 the header of the copy.  But the recording in the read set is a bit more
 annoying.  We need to maintain a thread-local *set* of all accessed
 objects, but we don't care about the order or recording the occasional
-duplicate.  Moreover we don't need to record the private objects; but we
-do need all other protected objects, as well as public objects.  The
-best approach is probably to have a quick check "is it definitely
-recorded already?" inline, and do the call if the check fails.  It needs
-careful design to be done in only a few CPU instructions, but it should
-be possible.
+duplicate.  Moreover we don't need to record the private copies of
+objects; but we do need all the protected and public objects.  The best
+approach is probably to have a quick check "is it definitely recorded
+already?" inline, and do the call if the check fails.  It needs careful
+design to be done in only a few CPU instructions, but it should be
+possible.
 
 (Code like this compiles to 4 instructions in the fast path:
 
@@ -179,16 +178,16 @@
 )
 
 The case of the write barrier is similar to the first half of the read
-barrier, but differs in the check we need to do.  We need to do a call
-if the object is not already private.  For performance reasons, "being
-private" is not directly a flag in the object, because when a
-transaction commits, we don't want to have to walk all private objects
-to change this flag.  Instead, private objects have a precise negative
-odd number in their `h_revision` field, called the "local revision
-identifier".  When a transaction commits, we change the value of the
-local revision identifier, and all previously-private objects become
-automatically protected.  So the write barrier fast-path checks if the
-`h_revision` is equal from the local revision identifier.
+barrier.  We need to do a call to the slow path if the object is not
+already private.  For performance reasons, "being private" is not
+directly a flag in the object, because when a transaction commits, we
+don't want to have to walk all private copies to change this flag.
+Instead, private copies have a precise negative odd number in their
+`h_revision` field, called the "private revision identifier".  When a
+transaction commits, we change the value of the private revision
+identifier, and all previously-private objects become automatically
+protected.  So the write barrier fast-path checks if the `h_revision` is
+equal to the private revision identifier.
 
 
 The extendable timestamp model
@@ -268,29 +267,10 @@
 transactions efficient.)
 
 
-Details of protected objects
-----------------------------
+Object copies in detail
+-----------------------
 
-As described above, each thread has a "local revision identifier".  It
-is a negative odd number that changes whenever it commits a transaction.
-The important point for the write barrier is that on any object copy,
-`h_revision` must be equal to the local revision identifier if and only
-if the copy is private.  A newly allocated object is always private.
-Once the transaction commits it becomes merely protected.  Its
-`h_revision` field doesn't change (but the thread's local revision
-identifier does).  If later the write barrier triggers on it, we make a
-backup copy of the object and copy the content of the primary copy to
-it.  We also set `h_revision` in the primary copy to point to the
-backup copy: as long as `h_revision` is different from the local
-revision identifier, its exact value is otherwise not used.  In this way
-we can keep using the same backup copy in each future transaction
-that needs to write to the object.
-
-The backup copy is used in two cases.  One is if the transaction aborts;
-then we copy the content back over the regular protected copy.  The
-other case is if the object is stolen.  In that case, if the object has
-an active backup copy, we must steal this one, because the regular
-protected copy is actually private at that point in time.
+See doc-objects.txt.
 
 
 Minor and major collections
@@ -322,5 +302,5 @@
 references in existing objects to point to either the real copy or the
 stub.  This is probably a bit involved: we might have to get the current
 revision numbers of all threads, and theoretically compact each interval
-of number down to only one number, but still keep one active revision
+of numbers down to only one number, but still keep one active revision
 number per thread.
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
http://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to