[Florent Guillaume] > I've been debugging session problems for two days, I feel it's time to > write down what I've observed and ask for other eyes to look at it (Chris > McDonough has been working on this too). This is all on Zope 2.9 trunk > BTW (ZODB 3.6.0b5 and Zope 2.9's tempstorage) with python 2.4.2.
Most people at Zope Corp are off the rest of the year, so don't expect much. I don't know anything about tempstorage myself, but since I'm on vacation too that doesn't much matter ;-) > What I observed was an unnatural number of repeated ConflictError (by > that, I mean "write" conflicts) followed by more and more > ReadConflictErrors as soon as you go beyond the time > CONFLICT_CACHE_MAXAGE of TemporaryStorage. > > To simplify debugging, I've boosted that constant and I only debug the > write conflict errors. > > The first write conflict happens when a BTree can't resolve a conflict. > The transaction is then aborted. > > Here, it should happen what happens correctly for FileStorage, the > connections' _flush_invalidations should get called and it shoud reset > the _txn_time of the connection to None so that the modified oids > (including the BTree's), when invalidated, reset the _txn_time to their > serial. So that on the next conflict, _setstate_noncurrent calls > loadBefore with that serial. > > But apparently the _flush_invalidations() of the connection is never > called. So _txn_time is never bumped into the future (and in turn, means > the next write conflict will try to load exactly the same serials as > before and fail again, etc.) . > > This seems to happen because: > > 1. the connection has _synch to True: it has registered itself has a > synchronizer, and expects its afterCompletion to be called when (among > others) the transaction is aborted, and the afterCompletion is calling > _flush_invalidations, > > 2. the synchronizer (the connection itself) has been lost from the > transaction's _serializers WeakSet for some reason (garbage collected I > guess). It was there in earlier transactions, but it's not there at the > time it's needed. > > If someone can make sense of this... > > Actually I don't know why the connection (=synchronizer) could be gone > from the transaction's _sychronizers WeakSet but still be in the DB's > connection pool WeakSet. I guess here lies the problem. That's a great question. It doesn't seem possible that it's gc (unless there's a relevant deep weakref gc bug remaining in Python, which I think is unlikely). A Transaction never removes anything from its ._synchronizers set. However, Transaction.__init__() gets its ._synchronizers set from the transaction manager that creates the transaction, and the TransactionManager._synchs set is deliberately mutable: a Transaction "sees" (in its ._synchronizers set) any changes made to the corresponding TransactionManager._synchs set (these "two" sets are the same object, just with different names). While a transaction manager never removes a synchronizer from its ._synchs set on its own initiative, anyone can call ITransactionManager.unregisterSynch(s) to force removal of synchronizer `s`. Then `s` will vanish both from TransactionManager._synchs and Transaction._synchronizers (again, they're really the same set object). In ZODB, the only caller of unregisterSynch() is Connection.close(). So that's the plain obvious way for this to happen: someone called cn.close() on the Connection `cn` in question. Are you sure that's not all there is to it? Closing the connection would remove `cn` from the transaction manager's ._synchs, and from the transaction's ._synchronizers. It would _not_ remove `cn` from the DB's connection pool's .all, though. There are only two ways a Connection `cn` can ever get out of .all: 1. There are no strong references to `cn` remaining, so gc reclaims `cn`. or 2. `cn` isn't currently in use, hasn't been in use for so long that it's bubbled to the front of the .available queue, and enough other Connections get closed that len(.available) exceeds pool_size. Then the "oldest" excess available connections are explicitly removed from both .available and .all. > Also, I don't know why we don't observe this for FileStorage, maybe > something has a hard reference on it somewhere? It doesn't sound likely to me like hard references are relevant -- but then I really don't know anything about tempstorage. HTH, and good luck in either case ;-) _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev