> I've been debugging session problems for two days, I feel it's time to
> write down what I've observed and ask for other eyes to look at it (Chris
> McDonough has been working on this too). This is all on Zope 2.9 trunk
> BTW (ZODB 3.6.0b5 and Zope 2.9's tempstorage) with python 2.4.2.
Most people at Zope Corp are off the rest of the year, so don't expect much.
I don't know anything about tempstorage myself, but since I'm on vacation
too that doesn't much matter ;-)
> What I observed was an unnatural number of repeated ConflictError (by
> that, I mean "write" conflicts) followed by more and more
> ReadConflictErrors as soon as you go beyond the time
> CONFLICT_CACHE_MAXAGE of TemporaryStorage.
> To simplify debugging, I've boosted that constant and I only debug the
> write conflict errors.
> The first write conflict happens when a BTree can't resolve a conflict.
> The transaction is then aborted.
> Here, it should happen what happens correctly for FileStorage, the
> connections' _flush_invalidations should get called and it shoud reset
> the _txn_time of the connection to None so that the modified oids
> (including the BTree's), when invalidated, reset the _txn_time to their
> serial. So that on the next conflict, _setstate_noncurrent calls
> loadBefore with that serial.
> But apparently the _flush_invalidations() of the connection is never
> called. So _txn_time is never bumped into the future (and in turn, means
> the next write conflict will try to load exactly the same serials as
> before and fail again, etc.) .
> This seems to happen because:
> 1. the connection has _synch to True: it has registered itself has a
> synchronizer, and expects its afterCompletion to be called when (among
> others) the transaction is aborted, and the afterCompletion is calling
> 2. the synchronizer (the connection itself) has been lost from the
> transaction's _serializers WeakSet for some reason (garbage collected I
> guess). It was there in earlier transactions, but it's not there at the
> time it's needed.
> If someone can make sense of this...
> Actually I don't know why the connection (=synchronizer) could be gone
> from the transaction's _sychronizers WeakSet but still be in the DB's
> connection pool WeakSet. I guess here lies the problem.
That's a great question. It doesn't seem possible that it's gc (unless
there's a relevant deep weakref gc bug remaining in Python, which I think is
A Transaction never removes anything from its ._synchronizers set.
However, Transaction.__init__() gets its ._synchronizers set from the
transaction manager that creates the transaction, and the
TransactionManager._synchs set is deliberately mutable: a Transaction
"sees" (in its ._synchronizers set) any changes made to the corresponding
TransactionManager._synchs set (these "two" sets are the same object, just
with different names).
While a transaction manager never removes a synchronizer from its ._synchs
set on its own initiative, anyone can call
ITransactionManager.unregisterSynch(s) to force removal of synchronizer `s`.
Then `s` will vanish both from TransactionManager._synchs and
Transaction._synchronizers (again, they're really the same set object).
In ZODB, the only caller of unregisterSynch() is Connection.close().
So that's the plain obvious way for this to happen: someone called
cn.close() on the Connection `cn` in question. Are you sure that's not all
there is to it? Closing the connection would remove `cn` from the
transaction manager's ._synchs, and from the transaction's ._synchronizers.
It would _not_ remove `cn` from the DB's connection pool's .all, though.
There are only two ways a Connection `cn` can ever get out of .all:
1. There are no strong references to `cn` remaining, so gc reclaims `cn`.
2. `cn` isn't currently in use, hasn't been in use for so long that
it's bubbled to the front of the .available queue, and enough
other Connections get closed that len(.available) exceeds pool_size.
Then the "oldest" excess available connections are explicitly removed
from both .available and .all.
> Also, I don't know why we don't observe this for FileStorage, maybe
> something has a hard reference on it somewhere?
It doesn't sound likely to me like hard references are relevant -- but then
I really don't know anything about tempstorage.
HTH, and good luck in either case ;-)
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org