Author: Armin Rigo <[email protected]> Branch: extradoc Changeset: r5077:e25ed3d1866c Date: 2013-10-14 20:08 +0200 http://bitbucket.org/pypy/extradoc/changeset/e25ed3d1866c/
Log: Nitty gritty details: starting. May become too long diff --git a/blog/draft/incremental-gc.rst b/blog/draft/incremental-gc.rst --- a/blog/draft/incremental-gc.rst +++ b/blog/draft/incremental-gc.rst @@ -49,4 +49,69 @@ Nitty gritty details ==================== +This was done as a patch to "minimark", our current GC, and called +"incminimark" for now. The former is a generational stop-the-world GC. +New objects are allocated "young", i.e. in the nursery, a special zone +of a few MB of memory. When it is full, a "minor collection" step moves +the surviving objects out of the nursery. This can be done quickly (a +few millisecond at most) because we only need to walk through the young +objects that survive --- usually a small fraction of all young objects. +From time to time, this minor collection is followed by a "major +collection": in that step, we walk *all* objects to classify which ones +are still alive and which ones are now dead (*marking*) and free the +memory occupied by the dead ones (*speeding*). +This "major collection" is what gives the long GC pauses. To fix this +problem we made the GC incremental: instead of running one complete +major collection, we split its work into a variable number of pieces +and run each piece after every minor collection for a while, until there +are no more pieces. The pieces are each doing a fraction of marking, or +a fraction of sweeping. + +The main issue is that splitting the major collections means that the +main program is actually running between the pieces, and so can change +the pointers in the objects to point to other objects. This is not +a problem for sweeping: dead objects will remain dead whatever the main +program does. However, it is a problem for marking. Let us see why. + +In terms of the incremental GC literature, objects are either "white", +"gray" or "black". They start as "white", become "gray" when they are +found to be alive, and become "black" when they have been fully +traversed --- at which point the objects that it points to have +themselves been marked gray, or maybe are already black. The gray +objects are the "frontier" between the black objects that we have found +to be reachable, and the white objects that represent the unknown part +of the world. When there are no more gray objects, the process is +finished: all remaining white objects are unreachable and can be freed +(by the following sweeping phase). + +In this model, the important part is that a black object can never point +to a white object: if the latter remains white until the end, it will be +freed, which is incorrect because the black object itself can still be +reached. + +The trick we used in PyPy is to consider minor collections as part of +the whole, rather than focus only on major collections. The existing +minimark GC had always used a "write barrier" to do its job, like any +generational GC. This write barrier is used to detect when an old +object (outside the nursery) is modified to point to a young object +(inside the nursery), which is essential information for minor +collections. Actually, although this was the goal, the actual write +barrier code was simpler: it just recorded all old objects into which we +wrote *any* pointer --- to a young or old object. It is actually a +performance improvement, because we don't need to check over and over +again if the written pointer points to a young object or not. + +This *unmodified* write barrier works for incminimark too. Imagine that +we are in the middle of the marking phase, running the main program. +The write barrier will record all old objects that are being modified. +Then at the next minor collection, all surviving young objects will be +moved out of the nursery. At this point, as we're about to continue +running the major collection's marking phase, we simply add to the list +of pending gray objects all the objects that we consider --- both the +objects listed as "old objects that are being modified", and the objects +that we just moved out of the nursery. A fraction of the former list +are turned back from the black to the gray color. This technique +implements nicely, if indirectly, what is called a "backward write +barrier" in the literature: the backwardness is about the color that +occasionally progresses backward from black to gray. _______________________________________________ pypy-commit mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-commit
