Re: [ZODB-Dev] Re: POSKeyError in zodb-3.6.0
Chris Bainbridge wrote at 2006-11-15 18:14 +: > ... >Another interesting thing; if I add time.sleep(1) to the end of the >while loop, then the problem goes away. Possibly there is some kind of >cache race condition, where the ZEO server sends invalidations >immediately after the client has commited? The effect of invalidations is synchronized. Invalidations become effective only at transaction boundaries and when a connection is opened. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: POSKeyError in zodb-3.6.0
Ok, I have the asyncore loop in, I've added explicit transaction begin and aborts, and cleaned up the test case a bit: import thread import asyncore import random from ZEO.ClientStorage import ClientStorage from ZODB import DB from persistent.list import PersistentList from ZODB.POSException import ConflictError import transaction storage = ClientStorage(('bw64node01', 12345)) db = DB(storage) conn = db.open() root = conn.root() conn.sync() thread.start_new_thread(asyncore.loop,()) if 'test' not in root: try: transaction.begin() root['test'] = PersistentList([0,1]) transaction.commit() except ConflictError: transaction.abort() g = root['test'] y = PersistentList() while 1: try: transaction.begin() g[g.index(random.choice(g))] = y #g[g.index(random.choice(g))] = PersistentList() transaction.commit() except ConflictError: transaction.abort() Now, when 4 or so instances are run in parallel, this will fail with POSKeyError corruption of the ZODB database. However, if you uncomment the commented out line, it's fine. Maybe I'm missing something - why can't I create a PersistentList outside of the transaction, and then add multiple entries inside root['test'] pointing to it? Another interesting thing; if I add time.sleep(1) to the end of the while loop, then the problem goes away. Possibly there is some kind of cache race condition, where the ZEO server sends invalidations immediately after the client has commited? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: POSKeyError in zodb-3.6.0
Chris Bainbridge wrote: Hi Alan, - You cant just catch ConflictError and pass I do conn.sync() at the top of the loop which is supposed to abort the connection and re-sync the objects with the zeo server. Urm, sounds like you're looking for transaction.abort(). Also, be aware of the weirdness that can occur if you run ZEO clients without an asyncore loop. These can lead you to need to call .sync()... - I think you can catch a ReadConflictError and *retry* that is ok. Eep, in this day and age you shouldn't be seeing any of these ;-) - But a ConflictError needs to be *retried* manually in your client code. Yup, abort the transaction and try again... afaik, this may be better coding style, but isn't actually required, since doesn't each commit implicitly begin a new transaction? Urm, the abort and possibly the .sync are absolutely necessary to get all the objects back into a sane, consistent state... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: POSKeyError in zodb-3.6.0
Hi Alan, Thanks for the advice. I'm using multiple processes, one on each host in a cluster. The extra thread is only used to run the asyncore loop, which allows zodb to receive asynchronous notifications. I've been playing around with your suggestions, and found that if I don't run the extra asyncore thread, and put replace conn.sync() with explicit calls to transaction.begin and end, then the test case will run without errors. However, if any process receives a SIGTERM signal, then the bug will occur and the database becomes corrupt. Unfortunately this doesn't solve the problem, since in my real app removing the asyncore loop just makes the bug take longer to show up. I've found a work around though, if instead of modifying the main list I do list[i].__setstate__(y.__getstate()) so that the code modifies the objects rather than the PersistentList, then the bug doesn't occur. - You cant just catch ConflictError and pass I do conn.sync() at the top of the loop which is supposed to abort the connection and re-sync the objects with the zeo server. - I think you can catch a ReadConflictError and *retry* that is ok. - But a ConflictError needs to be *retried* manually in your client code. If you catch a ConflictError you need to abort the transaction. You should be explicit about *beginning* transactions after ending previous transaction. afaik, this may be better coding style, but isn't actually required, since doesn't each commit implicitly begin a new transaction? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: POSKeyError in zodb-3.6.0
Hi, This issue results in a corrupted database. Can anyone confirm that they can reproduce this with the test case I provided, so that I can eliminate any potential problems with my setup as being the cause? Thanks, Chris ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: POSKeyError in zodb-3.6.0
The bug I'm getting on the client side is two or more clients simultaneously reporting: Traceback (most recent call last): File "/home/chrb/test_bad.py", line 45, in ? i = g.index(random.choice(g)) File "/usr/lib/python2.4/UserList.py", line 78, in index def index(self, item, *args): return self.data.index(item, *args) File "/usr/lib/python2.4/UserList.py", line 17, in __eq__ def __eq__(self, other): return self.data == self.__cast(other) File "/usr/lib/python2.4/site-packages/ZODB/Connection.py", line 732, in setstate self._setstate(obj) File "/usr/lib/python2.4/site-packages/ZODB/Connection.py", line 768, in _setstate p, serial = self._storage.load(obj._p_oid, self._version) File "/usr/lib/python2.4/site-packages/ZEO/ClientStorage.py", line 746, in load return self.loadEx(oid, version)[:2] File "/usr/lib/python2.4/site-packages/ZEO/ClientStorage.py", line 769, in loadEx data, tid, ver = self._server.loadEx(oid, version) File "/usr/lib/python2.4/site-packages/ZEO/ServerStub.py", line 192, in loadEx return self.rpc.call("loadEx", oid, version) File "/usr/lib/python2.4/site-packages/ZEO/zrpc/connection.py", line 536, in call raise inst # error raised by server ZODB.POSException.POSKeyError: 0x01f5 This seems to be triggered by the call to PersistentList.index. If I change the random select line to: i = random.randint(0, len(g)-1) then I no longer see this error. Presumably this just means that the access pattern for index() is sufficient to trigger this bug, rather than index itself being the problem. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: POSKeyError
Dieter Maurer wrote: > Tim Peters wrote at 2005-3-30 08:39 -0500: > >>... >>[Dieter Maurer] >> >>>The last packing bug was some time in the past. With current Zope >>>versions, there is no known packing bug. >> >>It's not that packing introduces new problems, it's that packing isn't an >>error recovery procedure: if the POSKeyErrors persist after the pack, then >>packing will have destroyed some amount of original evidence forever, >>potentially making it that much harder for someone to figure out how to >>repair the POSKeyErrors. > > > Thus, the person who fears his storage gets too huge makes > a backup copy (for analysis) and then packs the production storage. > > I do not advocate packing as an error recovery procedure. > It is just that isolated POSKeyErrors need not to prevent > packing. > OK, so I lost a little fear of packing, did a backup, and packed my database. Low and behold, the POSKeyErrors are gone. I do still have a dozen "refers to invalid object:" coming out of fsrefs.py. Are these errors waiting to happen? Or nothing to be concerned about? I was tempted to play around with the killthem script (http://mindlace.net/src/zodb/killthem.py ), but that seems to rely on zopectl which is a Zope 2.7 thing, right? I am still stuck on 2.6.2 until I can get rid of these damn errors. I do feel like my database is somewhat healthier. Hope it isn't a false sense of security. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Re: POSKeyError
Tim Peters wrote at 2005-3-30 08:39 -0500: >... >[Dieter Maurer] >> The last packing bug was some time in the past. With current Zope >> versions, there is no known packing bug. > >It's not that packing introduces new problems, it's that packing isn't an >error recovery procedure: if the POSKeyErrors persist after the pack, then >packing will have destroyed some amount of original evidence forever, >potentially making it that much harder for someone to figure out how to >repair the POSKeyErrors. Thus, the person who fears his storage gets too huge makes a backup copy (for analysis) and then packs the production storage. I do not advocate packing as an error recovery procedure. It is just that isolated POSKeyErrors need not to prevent packing. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev