-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The attached document is the result of thinking hard about the issues
involved in getting ZODB to work without its C extensions.  I think that
the current state of Python (new style objects, slots, weakrefs) allows
us to keep an efficient circular buffer of persistent objects without
requiring C.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          [EMAIL PROTECTED]
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIykYN+gerLs4ltQ4RAh9WAJ9qjwj3/Bip3dQSfNFIcXusBcfj3QCgqa3G
GJ3aPqzyEWqxwKXIaGifs2A=
=gXrT
-----END PGP SIGNATURE-----
Reimplementing Pickle Cache in Python
=====================================

Notes
-----

- According to the long introductory comment in ``cPickleCache.c``, the
  cache objects hold two primary data structures:

  - ``data`` is a dict mapping OID to persistent objects.
  
    o Persistent classes (e.g., ``ZClasses`` ) are stored here as 
      normal "strong" references.
      
    o Ghosts and reified persistent objects are stored here using "weak"
     references (i.e.,  they do not bump the refcount on their target).

  - ``ring_home`` is the a special node which acts as the head of a
    doubly-linked circular list of persistent objects, held in LRU order.
    Persistent objects themselves hold the 'next' and 'prev' pointers as
    part of their C struct.


Proposed Changes
----------------

- Move the persistent classes over into a separate dict,
  ``persistent_classes``.  Once loaded, *never* ghostify them via LRU
  (only on invalidations).  Change any code which expects to find them in
  ``data`` accordingly (e.g., ``klass_items``).

- Make the ``data`` dict an instance of ``weakref.WeakValueDictionary``.

- Define a new type, ``RingNode``, as follows::

    class RingNode(object):
        # 32 byte fixed size wrapper.
        __slots__ = ('object', 'next', 'prev')
        def __init__(self, object, next=None, prev=None):
            self.object = object
            self.next = next
            self.prev = prev

- Have the pickle cache hold on to a "marker" instance of this classs
  (e.g., with ``object`` set to ``None``) as the head of the ring.

- When adding a persistent object to the ring, create an instance of
  ``RingNode`` to wrap it, and add as the ``prev`` of the head node.

``IPickleCache`` API
--------------------

::

    from zope.interface import Attribute
    from zope.interface import Interface
    class IPickleCache(Interface):
        """ API of the cache for a ZODB connection.
        """
        def __getitem__(oid):
            """ -> the persistent object for OID.

            o Raise KeyError if not found.
            """

        def __setitem__(oid, value):
            """ Save the persistent object under OID.

            o 'oid' must be a string.

            o XXX? Raise KeyError on duplicate
            """

        def __delitem__(oid):
            """ Remove the persistent object for OID.

            o Raise KeyError if not found.
            """

        def get(oid, default=None):
            """ -> the persistent object for OID.

            o Return 'default' if not found.
            """

        def __len__():
            """ -> the number of OIDs in the cache.
            """

        def items():
            """-> a sequence of tuples (oid, value) for cached objects.

            o Only includes items in 'data' (no p-classes).
            """

        def ringlen():
            """ -> the number of persistent objects in the ring.

            o Only includes items in the ring (no ghosts or p-classes).
            """

        def lru_items():
            """ -> a sequence of tuples (oid, value) for cached objects.

            o Tuples will be in LRU order.

            o Only includes items in the ring (no ghosts or p-classes).
            """

        def klass_items():
            """-> a sequence of tuples (oid, value) for cached p-classes.

            o Only includes persistent classes.
            """

        def incrgc():
            """ Perform an incremental garbage collection sweep.

            o Reduce number of non-ghosts to 'cache_size', if possible.
            
            o Ghostify in LRU order.

            o Skip dirty or sticky objects.

            o Quit once we get down to 'cache_size'.
            """

        def full_sweep():
            """ Perform a full garbage collection sweep.

            o Reduce number of non-ghosts to 0, if possible.

            o Ghostify all non-sticky / non-changed objecs.
            """

        def minimize()
            """ Alias for 'full_sweep'.

            o XXX?
            """

        def reify(to_reify):
            """ Reify the indicated objects.

            o If 'to_reify' is a string, treat it as an OID.

            o Otherwise, iterate over it as a sequence of OIDs.

            o For each OID, if present in 'data' and in GHOST state:

              o Call '_p_unghostify' on the object.

              o Add it to the ring.

            o If any OID is present but not in GHOST state, skip it.

            o Raise KeyErrory if any OID is not present.
            """

        def invalidate(to_invalidate):
            """ Invalidate the indicated objects.

            o If 'to_invalidate' is a string, treat it as an OID.

            o Otherwise, iterate over it as a sequence of OIDs.

            o Any OID corresponding to a p-class will cause the corresponding
              p-class to be removed from the cache.

            o For all other OIDs, ghostify the corrsponding object and 
              remove it from the ring.
            """

        def debug_info():
            """ XXX
            """


        cache_size = Attribute(u'Target size of the cache')
        cache_drain_resistance = Attribute(u'Factor for draining cache below '
                                           u'target size')
        cache_non_ghost_count = Attribute(u'Number of non-ghosts in the cache '
                                          u'(XXX how is it different from '
                                          u'ringlen?')
        cache_data = property(lambda self: dict(self.data.items()))
        cache_klass_count = property(lambda self: len(self.persistent_classes))



Example Implementation
----------------------

(Locking elided)::

    import weakref

    class PickleCache(object):
        def __init__(self, jar, target_size):
            self.jar = jar
            self.target_size = target_size
            self.drain_resistance = 0
            self.non_ghost_count = 0
            self.persistent_classes = {}
            self.data = weakref.WeakValueDictionary()
            self.ring = RingNode(None)

        # IPickleCache API
        def __len__(self):
            return (len(self.persistent_classes) +
                    len(self.data))

        def __getitem__(self, oid):
            value = self.data.get(oid)
            if value is not None:
                return value
            return self.persistent_classes[oid]

        def __setitem__(self, oid, value):
            if not isinstance(oid, str):
                raise TypeError
            # XXX
            if oid in self.persistent_classes or oid in self.data:
                raise KeyError('Duplicate OID: %s' % oid)
            if type(value) is type:
                self.persistent_classes[oid] = value
            else:
                self.data[oid] = value
                if value._p_state != GHOST:
                    self.non_ghost_count += 1
                    mru = self.ring.prev
                    self.ring.prev = node = RingNode(value, self.ring, mru)
                    mru.next = node

        def __delitem__(self, oid):
            if not isinstance(oid, str):
                raise TypeError
            if oid in self.persistent_classes:
                del self.persistent_classes[oid]
            else:
                value = self.data.pop(oid)
                node = self.ring.next
                while node is not self.ring:
                    if node.object is value:
                        node.prev.next, node.next.prev = node.next, node.prev
                        self.non_ghost_count -= 1
                        break
                    node = node.next

        def get(self, oid, default=None):
            value = self.data.get(oid, self)
            if value is not self:
                return value
            return self.persistent_classes.get(oid, default)
            
        def ringlen(self):
            result = 0
            node = self.ring.next
            while node is not self.ring:
                result += 1
                node = node.next
            return result

        def items(self):
            return self.data.items()

        def lru_items(self):
            result = []
            node = self.ring.next
            while node is not self.ring:
                result.append((node.object._p_oid, node.object))
            return result

        def klass_items(self):
            return self.persistent_classes.items()

        def incrgc(self, ignored):
            target = self.target_size
            if self.drain_resistance >= 1:
                size = self.non_ghost_count
                target2 = size - 1 - (size / self.drain_resistance)
                if target2 < target:
                    target = target2
            self._sweep(target)

        def full_sweep(self, target=None):
            self._sweep(0)

        minimize = full_sweep

        def reify(self, oid):
            pass

        def invalidate(self, to_invalidate):
            if isintance(to_invalidate, str):
                self._invalidate(to_invalidate)
            else:
                for oid in to_invalidate:
                    self._invalidate(oid)

        def debug_info(self):
            result = []
            for oid, klass in self.persistent_classes.items():
                result.append((oid,
                               len(gc.getreferents(value),
                               type(value).__name__,
                               value._p_state,
                              ))
            for oid, value in self.data.items():
                result.append((oid,
                               len(gc.getreferents(value),
                               type(value).__name__,
                              ))

        cache_size = property(lambda self: self.target_size)
        cache_drain_resistance = property(lambda self: self.drain_resistance)
        cache_non_ghost_count = property(lambda self: self.non_ghost_count)
        cache_data = property(lambda self: dict(self.data.items()))
        cache_klass_count = property(lambda self: len(self.persistent_classes))

        # Helpers
        def _sweep(self, target):
            # lock
            node = self.ring.next
            while node is not self.ring and self.non_ghost_count > target:
                if node.object._p_state not in (STICKY, DIRTY):
                    node.prev.next, node.next.prev = node.next, node.prev
                    node.object = None
                    self.non_ghost_count -= 1
                node = node.next

        def _invalidate(self, oid):
            value = self.data.get(oid)
            if value is not None and value._p_state != GHOST:
                # value._p_invalidate() # NOOOO, we'll do it ourselves.
                value._p_ghostify() # TBD
                node = self.ring.next
                while node is not self.ring:
                    if node.object is value:
                        node.prev.next, node.next.prev = node.next, node.prev
                        break
            elif oid in self.persistent_classes:
                del self.persistent_classes[oid]


Required Changes to Persistent
------------------------------

- Add a ``_p_state`` which can be queried from Python.

- Add a ``_p_ghostify`` method:  clears ``__dict__`` and sets state to GHOST.

- Add a ``_p_unghostify`` method:  loads ``__dict__`` and sets state to CLEAN.
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to