Author: Armin Rigo <[email protected]>
Branch: cpyext-gc-support
Changeset: r80240:f94e515eb625
Date: 2015-10-15 17:14 +0200
http://bitbucket.org/pypy/pypy/changeset/f94e515eb625/

Log:    More thoughts...

diff --git a/pypy/doc/discussion/rawrefcount.rst 
b/pypy/doc/discussion/rawrefcount.rst
--- a/pypy/doc/discussion/rawrefcount.rst
+++ b/pypy/doc/discussion/rawrefcount.rst
@@ -10,103 +10,139 @@
 ob_pypy_link.  The ob_refcnt is the reference counter as used on
 CPython.  If the PyObject structure is linked to a live PyPy object,
 its current address is stored in ob_pypy_link and ob_refcnt is bumped
-by the constant REFCNT_FROM_PYPY_OBJECT.
+by either the constant REFCNT_FROM_PYPY, or the constant
+REFCNT_FROM_PYPY_DIRECT (== REFCNT_FROM_PYPY + SOME_HUGE_VALUE).
 
-rawrefcount_create_link_from_pypy(p, ob)
+Most PyPy objects exist outside cpyext, and conversely in cpyext it is
+possible that a lot of PyObjects exist without being seen by the rest
+of PyPy.  At the interface, however, we can "link" a PyPy object and a
+PyObject.  There are two kinds of link:
+
+rawrefcount.create_link_pypy(p, ob)
 
     Makes a link between an exising object gcref 'p' and a newly
-    allocated PyObject structure 'ob'.  Both must not be linked so far.
-    This adds REFCNT_FROM_PYPY_OBJECT to ob->ob_refcnt.
+    allocated PyObject structure 'ob'.  ob->ob_refcnt must be
+    initialized to either REFCNT_FROM_PYPY, or
+    REFCNT_FROM_PYPY_DIRECT.  (The second case is an optimization:
+    when the GC finds the PyPy object and PyObject no longer
+    referenced, it can just free() the PyObject.)
 
-rawrefcount_create_link_to_pypy(p, ob)
+rawrefcount.create_link_pyobj(p, ob)
 
     Makes a link from an existing PyObject structure 'ob' to a newly
-    allocated W_CPyExtPlaceHolderObject 'p'.  The 'p' should have a
-    back-reference field pointing to 'ob'.  This also adds
-    REFCNT_FROM_PYPY_OBJECT to ob->ob_refcnt.
+    allocated W_CPyExtPlaceHolderObject 'p'.  You must also add
+    REFCNT_FROM_PYPY to ob->ob_refcnt.  For cases where the PyObject
+    contains all the data, and the PyPy object is just a proxy.  The
+    W_CPyExtPlaceHolderObject should have only a field that contains
+    the address of the PyObject, but that's outside the scope of the
+    GC.
 
-rawrefcount_from_obj(p)
+rawrefcount.from_obj(p)
 
     If there is a link from object 'p', and 'p' is not a
     W_CPyExtPlaceHolderObject, returns the corresponding 'ob'.
     Otherwise, returns NULL.
 
-rawrefcount_to_obj(ob)
+rawrefcount.to_obj(Class, ob)
 
-    Returns ob->ob_pypy_link, cast to a GCREF.
+    Returns ob->ob_pypy_link, cast to an instance of 'Class'.
 
 
 Collection logic
 ----------------
 
-Objects existing purely on the C side have ob->ob_from_pypy == NULL;
+Objects existing purely on the C side have ob->ob_pypy_link == 0;
 these are purely reference counted.  On the other hand, if
-ob->ob_from_pypy != NULL, then ob->ob_refcnt is at least
-REFCNT_FROM_PYPY_OBJECT and the object is part of a "link".
+ob->ob_pypy_link != 0, then ob->ob_refcnt is at least REFCNT_FROM_PYPY
+and the object is part of a "link".
 
 The idea is that links whose 'p' is not reachable from other PyPy
-objects *and* whose 'ob->ob_refcnt' is REFCNT_FROM_PYPY_OBJECT are the
-ones who die.  But it is more messy because links created with
-rawrefcount_create_link_to_pypy() need to have a deallocator called,
+objects *and* whose 'ob->ob_refcnt' is REFCNT_FROM_PYPY or
+REFCNT_FROM_PYPY_DIRECT are the ones who die.  But it is more messy
+because PyObjects still (usually) need to have a tp_dealloc called,
 and this cannot occur immediately (and can do random things like
 accessing other references this object points to, or resurrecting the
 object).
 
-Let P = list of links created with rawrefcount_create_link_from_pypy()
-and O = list of links created with rawrefcount_create_link_to_pypy().
+Let P = list of links created with rawrefcount.create_link_pypy()
+and O = list of links created with rawrefcount.create_link_pyobj().
 The PyPy objects in the list O are all W_CPyExtPlaceHolderObject: all
-the data is in the PyObjects, and all references are regular
-CPython-like reference counts.  It is the opposite with the P links:
-all references are regular PyPy references from the 'p' object, and
-the 'ob' is trivial.
+the data is in the PyObjects, and all references (if any) are regular
+CPython-like reference counts.
 
-So, after the collection we do this about P links:
+So, during the collection we do this about P links:
 
     for (p, ob) in P:
-        if ob->ob_refcnt != REFCNT_FROM_PYPY_OBJECT:
+        if ob->ob_refcnt != REFCNT_FROM_PYPY
+               and ob->ob_refcnt != REFCNT_FROM_PYPY_DIRECT:
             mark 'p' as surviving, as well as all its dependencies
 
-    for (p, ob) in P:
-        if p is not surviving:
-            unlink p and ob, free ob
+At the end of the collection, the P and O links are both handled like
+this:
 
-Afterwards, the O links are handled like this:
-
-    for (p, ob) in O:
-        # p is trivial: it cannot point to other PyPy objects
+    for (p, ob) in P + O:
         if p is not surviving:
             unlink p and ob
-            ob->ob_refcnt -= REFCNT_FROM_PYPY_OBJECT
-            if ob->ob_refcnt == 0:
-                invoke _Py_Dealloc(ob) later, outside the GC
+            if ob->ob_refcnt == REFCNT_FROM_PYPY_DIRECT:
+                free(ob)
+            else:
+                ob->ob_refcnt -= REFCNT_FROM_PYPY
+                if ob->ob_refcnt == 0:
+                    invoke _Py_Dealloc(ob) later, outside the GC
 
 
 GC Implementation
 -----------------
 
-We need two P lists and two O lists, for young or old objects.  All
-four lists can actually be linked lists of 'ob', using yet another
-field 'ob_pypy_next'; or they can be regular AddressLists (unsure
-about the overhead of this extra field for all PyObjects -- even ones
-not linked to PyPy objects).
+We need two copies of both the P list and O list, for young or old
+objects.  All four lists can be regular AddressLists of 'ob' objects.
 
 We also need an AddressDict mapping 'p' to 'ob' for all links in the P
-list.  This dict contains both young and old 'p'; we simply write a
-new entry when the object moves.  As a result it can contain some
-extra garbage entries after some minor collections.  It is cleaned up
-by being rebuilt at the next major collection.  We never walk all
-items of that dict; we only walk the two explicit P lists.
+list, and update it when PyPy objects move.
 
 
 Further notes
 -------------
 
-For small immutable types like <int> and <float>, we can actually
-create a PyIntObject as a complete copy of the W_IntObject whenever
-asked, and not record any link.  Is it cheaper?  Unclear.
+For objects that are opaque in CPython, like <dict>, we always create
+a PyPy object, and then when needed we make an empty PyObject and
+attach it with create_link_pypy()/REFCNT_FROM_PYPY_DIRECT.
 
-A few special types need to be reflected both as PyPy objects and
-PyObjects.  For now we assume that these are large and mostly
-immutable, like <type> objects.  They should be linked in some mixture
-of the P list and the O list.  Likely, the P list with an extra flag
-that says "_Py_Dealloc must be invoked".
+For <int> and <float> objects, the corresponding PyObjects contain a
+"long" or "double" field too.  We link them with create_link_pypy()
+and we can use REFCNT_FROM_PYPY_DIRECT too: 'tp_dealloc' doesn't
+need to be called, and instead just calling free() is fine.
+
+For <type> objects, we need both a PyPy and a PyObject side.  These
+are made with create_link_pypy()/REFCNT_FROM_PYPY.
+
+For custom PyXxxObjects allocated from the C extension module, we
+need create_link_pyobj().
+
+For <str> or <unicode> objects coming from PyPy, we use
+create_link_pypy()/REFCNT_FROM_PYPY_DIRECT with a PyObject
+preallocated with the size of the string.  We copy the string
+lazily into that area if PyString_AS_STRING() is called.
+
+For <str>, <unicode>, <tuple> or <list> objects in the C extension
+module, we first allocate it as only a PyObject, which supports
+mutation of the data from C, like CPython.  When it is exported to
+PyPy we could make a W_CPyExtPlaceHolderObject with
+create_link_pyobj().
+
+For <tuple> objects coming from PyPy, if they are not specialized,
+then the PyPy side holds a regular reference to the items.  Then we
+can allocate a PyTupleObject and store in it borrowed PyObject
+pointers to the items.  Such a case is created with
+create_link_pypy()/REFCNT_FROM_PYPY_DIRECT.  If it is specialized,
+then it doesn't work because the items are created just-in-time on the
+PyPy side.  In this case, the PyTupleObject needs to hold real
+references to the PyObject items, and we use create_link_pypy()/
+REFCNT_FROM_PYPY.  In all cases, we have a C array of PyObjects
+that we can return from PySequence_Fast_ITEMS.
+
+For <list> objects coming from PyPy, we can use a cpyext list
+strategy.  The list turns into a PyListObject, as if it had been
+allocated from C in the first place.  The special strategy can hold
+(only) a direct reference to the PyListObject, and we can use
+create_link_pyobj().  PySequence_Fast_ITEMS then works for lists too.
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to