-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The attached document is the result of thinking hard about the issues involved in getting ZODB to work without its C extensions. I think that the current state of Python (new style objects, slots, weakrefs) allows us to keep an efficient circular buffer of persistent objects without requiring C.
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 [EMAIL PROTECTED] Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIykYN+gerLs4ltQ4RAh9WAJ9qjwj3/Bip3dQSfNFIcXusBcfj3QCgqa3G GJ3aPqzyEWqxwKXIaGifs2A= =gXrT -----END PGP SIGNATURE-----
Reimplementing Pickle Cache in Python ===================================== Notes ----- - According to the long introductory comment in ``cPickleCache.c``, the cache objects hold two primary data structures: - ``data`` is a dict mapping OID to persistent objects. o Persistent classes (e.g., ``ZClasses`` ) are stored here as normal "strong" references. o Ghosts and reified persistent objects are stored here using "weak" references (i.e., they do not bump the refcount on their target). - ``ring_home`` is the a special node which acts as the head of a doubly-linked circular list of persistent objects, held in LRU order. Persistent objects themselves hold the 'next' and 'prev' pointers as part of their C struct. Proposed Changes ---------------- - Move the persistent classes over into a separate dict, ``persistent_classes``. Once loaded, *never* ghostify them via LRU (only on invalidations). Change any code which expects to find them in ``data`` accordingly (e.g., ``klass_items``). - Make the ``data`` dict an instance of ``weakref.WeakValueDictionary``. - Define a new type, ``RingNode``, as follows:: class RingNode(object): # 32 byte fixed size wrapper. __slots__ = ('object', 'next', 'prev') def __init__(self, object, next=None, prev=None): self.object = object self.next = next self.prev = prev - Have the pickle cache hold on to a "marker" instance of this classs (e.g., with ``object`` set to ``None``) as the head of the ring. - When adding a persistent object to the ring, create an instance of ``RingNode`` to wrap it, and add as the ``prev`` of the head node. ``IPickleCache`` API -------------------- :: from zope.interface import Attribute from zope.interface import Interface class IPickleCache(Interface): """ API of the cache for a ZODB connection. """ def __getitem__(oid): """ -> the persistent object for OID. o Raise KeyError if not found. """ def __setitem__(oid, value): """ Save the persistent object under OID. o 'oid' must be a string. o XXX? Raise KeyError on duplicate """ def __delitem__(oid): """ Remove the persistent object for OID. o Raise KeyError if not found. """ def get(oid, default=None): """ -> the persistent object for OID. o Return 'default' if not found. """ def __len__(): """ -> the number of OIDs in the cache. """ def items(): """-> a sequence of tuples (oid, value) for cached objects. o Only includes items in 'data' (no p-classes). """ def ringlen(): """ -> the number of persistent objects in the ring. o Only includes items in the ring (no ghosts or p-classes). """ def lru_items(): """ -> a sequence of tuples (oid, value) for cached objects. o Tuples will be in LRU order. o Only includes items in the ring (no ghosts or p-classes). """ def klass_items(): """-> a sequence of tuples (oid, value) for cached p-classes. o Only includes persistent classes. """ def incrgc(): """ Perform an incremental garbage collection sweep. o Reduce number of non-ghosts to 'cache_size', if possible. o Ghostify in LRU order. o Skip dirty or sticky objects. o Quit once we get down to 'cache_size'. """ def full_sweep(): """ Perform a full garbage collection sweep. o Reduce number of non-ghosts to 0, if possible. o Ghostify all non-sticky / non-changed objecs. """ def minimize() """ Alias for 'full_sweep'. o XXX? """ def reify(to_reify): """ Reify the indicated objects. o If 'to_reify' is a string, treat it as an OID. o Otherwise, iterate over it as a sequence of OIDs. o For each OID, if present in 'data' and in GHOST state: o Call '_p_unghostify' on the object. o Add it to the ring. o If any OID is present but not in GHOST state, skip it. o Raise KeyErrory if any OID is not present. """ def invalidate(to_invalidate): """ Invalidate the indicated objects. o If 'to_invalidate' is a string, treat it as an OID. o Otherwise, iterate over it as a sequence of OIDs. o Any OID corresponding to a p-class will cause the corresponding p-class to be removed from the cache. o For all other OIDs, ghostify the corrsponding object and remove it from the ring. """ def debug_info(): """ XXX """ cache_size = Attribute(u'Target size of the cache') cache_drain_resistance = Attribute(u'Factor for draining cache below ' u'target size') cache_non_ghost_count = Attribute(u'Number of non-ghosts in the cache ' u'(XXX how is it different from ' u'ringlen?') cache_data = property(lambda self: dict(self.data.items())) cache_klass_count = property(lambda self: len(self.persistent_classes)) Example Implementation ---------------------- (Locking elided):: import weakref class PickleCache(object): def __init__(self, jar, target_size): self.jar = jar self.target_size = target_size self.drain_resistance = 0 self.non_ghost_count = 0 self.persistent_classes = {} self.data = weakref.WeakValueDictionary() self.ring = RingNode(None) # IPickleCache API def __len__(self): return (len(self.persistent_classes) + len(self.data)) def __getitem__(self, oid): value = self.data.get(oid) if value is not None: return value return self.persistent_classes[oid] def __setitem__(self, oid, value): if not isinstance(oid, str): raise TypeError # XXX if oid in self.persistent_classes or oid in self.data: raise KeyError('Duplicate OID: %s' % oid) if type(value) is type: self.persistent_classes[oid] = value else: self.data[oid] = value if value._p_state != GHOST: self.non_ghost_count += 1 mru = self.ring.prev self.ring.prev = node = RingNode(value, self.ring, mru) mru.next = node def __delitem__(self, oid): if not isinstance(oid, str): raise TypeError if oid in self.persistent_classes: del self.persistent_classes[oid] else: value = self.data.pop(oid) node = self.ring.next while node is not self.ring: if node.object is value: node.prev.next, node.next.prev = node.next, node.prev self.non_ghost_count -= 1 break node = node.next def get(self, oid, default=None): value = self.data.get(oid, self) if value is not self: return value return self.persistent_classes.get(oid, default) def ringlen(self): result = 0 node = self.ring.next while node is not self.ring: result += 1 node = node.next return result def items(self): return self.data.items() def lru_items(self): result = [] node = self.ring.next while node is not self.ring: result.append((node.object._p_oid, node.object)) return result def klass_items(self): return self.persistent_classes.items() def incrgc(self, ignored): target = self.target_size if self.drain_resistance >= 1: size = self.non_ghost_count target2 = size - 1 - (size / self.drain_resistance) if target2 < target: target = target2 self._sweep(target) def full_sweep(self, target=None): self._sweep(0) minimize = full_sweep def reify(self, oid): pass def invalidate(self, to_invalidate): if isintance(to_invalidate, str): self._invalidate(to_invalidate) else: for oid in to_invalidate: self._invalidate(oid) def debug_info(self): result = [] for oid, klass in self.persistent_classes.items(): result.append((oid, len(gc.getreferents(value), type(value).__name__, value._p_state, )) for oid, value in self.data.items(): result.append((oid, len(gc.getreferents(value), type(value).__name__, )) cache_size = property(lambda self: self.target_size) cache_drain_resistance = property(lambda self: self.drain_resistance) cache_non_ghost_count = property(lambda self: self.non_ghost_count) cache_data = property(lambda self: dict(self.data.items())) cache_klass_count = property(lambda self: len(self.persistent_classes)) # Helpers def _sweep(self, target): # lock node = self.ring.next while node is not self.ring and self.non_ghost_count > target: if node.object._p_state not in (STICKY, DIRTY): node.prev.next, node.next.prev = node.next, node.prev node.object = None self.non_ghost_count -= 1 node = node.next def _invalidate(self, oid): value = self.data.get(oid) if value is not None and value._p_state != GHOST: # value._p_invalidate() # NOOOO, we'll do it ourselves. value._p_ghostify() # TBD node = self.ring.next while node is not self.ring: if node.object is value: node.prev.next, node.next.prev = node.next, node.prev break elif oid in self.persistent_classes: del self.persistent_classes[oid] Required Changes to Persistent ------------------------------ - Add a ``_p_state`` which can be queried from Python. - Add a ``_p_ghostify`` method: clears ``__dict__`` and sets state to GHOST. - Add a ``_p_unghostify`` method: loads ``__dict__`` and sets state to CLEAN.
_______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev