Hello ZODB dev,

I was recently trying to GC a large multi-database setup for the first time 
using zc.zodbdgc. The process wouldn't complete (or really even get started) 
because of an IndexError being thrown from `zc.zodbdgc.getrefs` (__init__.py 
line 287). As I traced through it, it began to look like the combination of 
`cPickle.Unpickler.noload` and multi-database persistent ids (which in ZODB are 
list objects) fails, generating an empty list instead of the expected [ref 
type, args] list documented in `ZODB.serialize`. This makes it impossible to 
correctly GC a multi-database.

I was curious if anyone else had seen this, or maybe I'm just doing something 
wrong? We solved our problem by using `load` instead of `noload`, but I 
wondered if there might be a better way? 


I'm working under Python 2.7.6 and 2.7.3 with ZODB 4.0.0, zc.zodbdgc 0.6.1 and 
eventually zodbpickle 0.5.2. Most of my results were repeated on both Mac OS X 
and Linux.

After hitting the IndexError, I began debugging the problem. When it became 
clear that the persistent_load callback was simply getting the wrong persistent 
ids passed to it (empty lists instead of complete multi-db refs), I tried 
swapping in zodbpickle for the stock cPickle to the same effect. Here's some 
code demonstrating the problem:

This pickle data came right out of ZODB, captured during a debug session of 
zc.zodbdgc. It has three persistent ids, two cross database and one in the same 

    >>> p = 

This code is copy-and-pasted out of zc.zodbgc getrefs. It's supposed to find 
all the persistent refs and put them inside the `refs` list:

    >>> import cPickle
    >>> import cStringIO
    >>> refs = []
    >>> u = cPickle.Unpickler(cStringIO.StringIO(p))
    >>> u.persistent_load = refs
    >>> u.noload()
    >>> u.noload()

But if we look at `refs`, we see that the first two cross-database refs are 
returned as empty lists, not the correct value:

    >>> refs
    [[], [], ('\x00\x00\x00\x00\x00\x00\x00\x10', None)]

If instead we use `load` to read the state, we get the correct references:

    >>> refs = []
    >>> u = cPickle.Unpickler(cStringIO.StringIO(p))
    >>> u.persistent_load = refs
    >>> u.noload()
    >>> u.load()

    >>> refs
    [['m', ('Users_1_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01', <class 
     ['m', ('Users_2_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01', <class 
     ('\x00\x00\x00\x00\x00\x00\x00\x10', <class 'zope.site.folder.Folder'>)]

The results are the same using zodbpickle or using an actual callback function 
instead of the append-directly-to-list shortcut. 

If we fix the IndexError by checking the size of the list first, we miss all 
the cross-db references, meaning that a GC is going to be too aggressive. But 
using `load` is slower and requires access to all of the classes referenced. If 
anyone has run into this before or has other suggestions, I'd appreciate 
hearing them.


For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to