I believe I identified one cause for catastrophic thread-race problems here (sharing a Connection across multiple threads, or for any other reason sharing an in-memory persistent object across non-serialized threads), even when objects are never modified. It doesn't look shallow to me.
The C code for materializing an object's state (unghostify() in cPersistence.c) contains this pair of lines in ZODB 3.2: self->state = cPersistent_CHANGED_STATE; r = callmethod1(self->jar, py_setstate, (PyObject*)self); That's pretty mysterious as-is. A comment before that in ZODB 3.3 tries to clarify: /* set state to CHANGED while setstate() call is in progress to prevent a recursive call to _PyPersist_Load(). */ Alas, that's also a massive (yet tiny <wink>) timing hole when sharing an in-memory persistent object across threads: setstate can call back into Python (if for no other reason, possibly to look up the "__setstate__" attribute on self), and the GIL can be released then, so other threads can run while setstate is in progress. At this point, self's memory state is still missing, but self->state claims it's legitimate. So if another thread tries to access the same in-memory object while the thread running setstate is in any piece of Python code, it will be fooled into thinking the state has already been materialized, and read up nonsense bytes. In particular, the `len` and `size` members of a ghost Bucket are both 0, and that obviously explains the sporadic AssertionError: Bucket length < 1 _check() failures. It's also the cause-- although it's subtler --of the RuntimeError: the bucket being iterated changed size failures: when BTreeItems_seek() sees a bucket for which the current iteration cursor position within the bucket is >= the current len of the bucket, the only conclusion it can draw is that someone removed items from the bucket since the *last* time BTreeItems_seek() was called; or, IOW, someone changed the bucket's size during iteration. The cursor is initialized to 0, and that's >= self->len (which is also 0) for a ghost bucket, so cursor >= self->len the first time BTreeItems_seek() is called, and we get a misleading message as a result. It's never a misleading message given a non-insane BTree, because it's impossible for a (non-ghost) bucket that's part of a BTree to have self->len == 0 (an empty bucket is always physically removed from a BTree internally, as soon as it becomes empty), and BTreeItems_seek() is correctly doing the "if this bucket is a ghost, materialize its state" dance before accessing self->len. self->state is lying about whether it's a ghost here, and garbage-in garbage-out follows. Of course worse things than exceptions can happen: reading up a pile of NUL bytes when real data is expected can lead to anything (e.g., segfaults aren't surprising given this). Note that this also explains why the problems went away when I introduced a global mutex to strictly serialize persistent object access across the threads. It really is necessary to serialize "even just reads" if you're sharing in-memory objects among threads, if there's any chance that one of them is a ghost. There may be other reasons too; I can't promise that strictly serializing all thread access reliably prevents all possible problems here (although it seems likely to me, it's not an intended use, and the analysis in this msg is all that's been done wrt it). Note that the old email message Martin referenced: http://mail.python.org/pipermail/python-list/2001-February/030675.html said that what's what its author did: In our application, we serialize our access to the set of Persistent objects amongst all threads (effectively using a global lock on the root object that is obtained whenever a thread is going to be playing with state data), and any work a thread does on those objects is committed before the lock is released. _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev