Author: Armin Rigo <[email protected]>
Branch:
Changeset: r1125:3db91dec36e2
Date: 2014-03-31 09:33 +0200
http://bitbucket.org/pypy/stmgc/changeset/3db91dec36e2/
Log: Update the text to describe the N+1 segments
diff --git a/c7/README.txt b/c7/README.txt
--- a/c7/README.txt
+++ b/c7/README.txt
@@ -57,7 +57,7 @@
We have a small, fixed number of big pieces of memory called "segments".
Each segment has enough (virtual) address space for all the objects that
the program needs. This is actually allocated from a single big mmap()
-so that pages can be exchanged between segments with remap_file_pages().
+so that pages can be shared between segments with remap_file_pages().
We call N the number of segments. Actual threads are not limited in
number; they grab one segment in order to run GC-manipulating code, and
release it afterwards. This is similar to what occurs with the GIL,
@@ -81,20 +81,26 @@
--- much like the OS does after a fork() for pages modified by one or
the other process.
-In more details: the first page of addresses in each thread-local region
-(4096 bytes) is made non-accessible, to detect errors of accessing the
-NULL pointer. The second page is reserved for thread-local data. The
-rest is divided into 1/16 for thread-local read markers, followed by
-15/16 for the real objects. We initially use remap_file_pages() on this
-15/16 range. The read markers are described below.
+In more details: we actually get N + 1 consecutive segments, and segment
+number 0 is reserved to contain the globally committed state of the
+objects. The segments actually used by threads are numbered from 1 to
+N. The first page of addresses in each segment is made non-accessible,
+to detect errors of accessing the NULL pointer. The second page is
+reserved for thread-local data. The rest is divided into 1/16 for
+thread-local read markers, followed by 15/16 for the real objects. The
+read markers are described below. We use remap_file_pages() on this
+15/16 range: every page in this range can be either remapped to the same
+page from segment 0 ("shared", the initial state), or remapped back to
+itself ("private").
-Each transaction records the objects that it changed. These are
-necessarily within unshared pages. When we want to commit a
-transaction, we ask for a safe-point (suspending the other threads in a
-known state), and then we copy again the modified objects into the other
-version(s) of that data. The point is that, from another thread's point
-of view, the memory didn't appear to change unexpectedly, but only when
-waiting in a safe-point.
+Each transaction records the objects that it changed, and makes sure
+that the corresponding pages are "private" in this segment. When we
+want to commit a transaction, we ask for a safe-point (suspending the
+other threads in a known state), and then we copy the modified objects
+into the share pages, as well as into the other segments if they are
+also backed by private pages. The point is that, from another thread's
+point of view, the memory didn't appear to change unexpectedly, but only
+when waiting in a safe-point.
Moreover, we detect read-write conflicts when trying to commit. To do
this, each transaction needs to track in their own (private) read
@@ -105,11 +111,13 @@
requiring an abort (which it will do when trying to leave the
safe-point).
-On the other hand, write-write conflicts are detected eagerly, which is
-necessary to avoid that all segments contain a modified version of the
-object and no segment is left with the original version. It is done
-with a compare-and-swap into an array of write locks (only the first
-time a given old object is modified by a given transaction).
+On the other hand, write-write conflicts are detected eagerly. It is
+done with a compare-and-swap into an array of write locks (only the
+first time a given old object is modified by a given transaction). This
+used to be necessary in some previous version, but is kept for now
+because it would require more measurements to know if it's a good or bad
+idea; the alternative is to simply let conflicting writes proceed and
+detect the situation at commit time only.
Object creation and GC
@@ -127,7 +135,7 @@
objects that are also outside the nursery.
- pages need to be unshared when they contain old objects that are then
- modified.
+ modified (and only in this case).
- we need a write barrier to detect the changes done to any non-nursery
object (the first time only). This is just a flag check. Then the
@@ -139,13 +147,15 @@
to be synchronized, but ideally the threads should then proceed
to do a parallel GC (i.e. mark in all threads in parallel, and
then sweep in al threads in parallel, with one arbitrary thread
- taking on the additional coordination role needed).
+ taking on the additional coordination role needed). But we'll think
+ about it when it becomes a problem.
- the major collections should be triggered by the amount of really-used
- memory, which means: counting the unshared pages as N pages. Major
- collection should then re-share the pages as much as possible. This is
- the essential part that guarantees that old, no-longer-modified
- bunches of objects are eventually present in only one copy in memory,
- in shared pages --- while at the same time bounding the number of
- calls to remap_file_pages() for each page at N-1 per major collection
- cycle.
+ memory, which means: counting each actual copy of a private page
+ independently, but shared pages as one. Major collection will then
+ re-share the pages as much as possible. This is the essential part
+ that guarantees that old, no-longer-modified bunches of objects are
+ eventually present in only one copy in memory, in shared pages ---
+ while at the same time bounding the number of calls to
+ remap_file_pages() at two for each private page (one to privatize, one
+ to re-share) for a complete major collection cycle.
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit