[ZODB-Dev] what could cause slowness checking if databases are alive?
Hi All, This is all on Zope 2.9.8 for both the ZEO server and the app server clients. I have a customer with 2 app servers behind a hardware loadbalancer. The load balancer uses a hash of a url on each app server to determine if the app server is alive. Likewise, if the app server doesn't respond to that status url, or takes too long to respond, the load balancer drops the app server out of the pool. Each app server has 1 zope app process that's proxied through an apache instance. The status page is generated by the following external method: import socket from ZEO.ClientStorage import ClientStorage template = This is app server: %s Connected as follows: %s def status(self): connections = [] databases = self.Control_Panel.Database for name in sorted(databases.getDatabaseNames()): storage = databases[name]._getDB()._storage if not isinstance(storage,ClientStorage): continue connections.append('%s: %r'%(name,storage._addr)) return template % ( socket.gethostname(), '\n'.join(connections) ) ...so as to check that the ZEO server connections are all fine. However, the code that prints the storage name and address seems to cause slow responses every so often. Sadly, they're so slow the load balancer drops the app server out of the pool. This is bad when it happens to both app servers :-S We added logging to the apache instances to show how long it took to serve the status requests. It looks like the status url normally takes around 3-4ms to serve, however some requests are taking longer (1.7s and even up to 70s on occasion) Any idea what might be causing this slowness, or how I can find out what might be causing this slowness? cheers, Chris -- Simplistix - Content Management, Batch Processing Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] zodb design document
hi there! is there a zodb design document? what i'm interested in are the following things: - some basic description how the zodb works (f.ex. i don't understand what the difference between a serial and a transaction id is)? - looking at basestorage and the methods that have to be implemented, is there documentation what the reimplementation in a concrete subclass has to do exactly? (ordering of things, desired side effects etc.)? why do i ask? i'd like to create a RadosStorage zodb backend ceph's underlying object storage RADOS (see http://ceph.newdream.net/blog/category/rados/ ) thanks in advance for your replies! best regards, jürgen herrmann -- XLhost.de - eXperts in Linux hosting ® XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Volker Geith, Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 WEB: http://www.XLhost.de IRC: #xlh...@irc.quakenet.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] zodb design document
On Wed, Aug 12, 2009 at 10:33 AM, Jürgen Herrmannjuergen.herrm...@xlhost.de wrote: hi there! is there a zodb design document? There's: http://www.zope.org/Documentation/Developer/Models/ZODB But it is pretty old and out of date. what i'm interested in are the following things: - some basic description how the zodb works (f.ex. i don't understand what the difference between a serial and a transaction id is)? There is no difference. I thought there could be when I originally designed ZODB, but it turns out to be good enough to use the tid as the serial. - looking at basestorage and the methods that have to be implemented, is there documentation what the reimplementation in a concrete subclass has to do exactly? (ordering of things, desired side effects etc.)? You should look at the storage interfaces in ZODB.interfaces and in ZEO.interfaces. You should have a firm understanding of how transactions work in general. Our transaction system is based on time-stamps. You should also ask questions here. why do i ask? i'd like to create a RadosStorage zodb backend ceph's underlying object storage RADOS (see http://ceph.newdream.net/blog/category/rados/ ) That's interesting. You might want to look at DirectoryStorage, http://dirstorage.sourceforge.net/, which implements the storage APIs (at least as they existed a few years ago) on top if a file system. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
I begin to use RelStorage in a production site with Plone 2.5. Everything was running without failures since 01 august 2009. But today after a failure in tpc_abort, all instances conneceted to MySQL can't acquire commit lock. Follow tpc_abort traceback: 2009-08-12T14:12:08 ERROR txn.1115806016 Error in tpc_abort() on manager MultiObjectResourceAdapter for ZODB.DB.TransactionalUndo object at 0x210f6790 at 654293328 Traceback (most recent call last): File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 533, in _cleanup rm.tpc_abort(self) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 628, in tpc_abort self.manager.tpc_abort(txn) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/ZODB/BaseStorage.py, line 194, in tpc_abort self._abort() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 710, in _abort self._rollback_load_connection() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 166, in _rollback_load_connection self._load_conn.rollback() OperationalError: (2006, 'MySQL server has gone away') And after, all instances report this exeption: 2009-08-12T14:21:53 ERROR Zope.SiteErrorLog http://adm.agecom.ba.gov.br/login_form Traceback (innermost last): Module ZPublisher.Publish, line 121, in publish Module Zope2.App.startup, line 240, in commit Module transaction._manager, line 96, in commit Module transaction._transaction, line 395, in commit Module transaction._transaction, line 498, in _commitResources Module ZODB.Connection, line 730, in tpc_vote Module relstorage.relstorage, line 675, in tpc_vote Module relstorage.relstorage, line 659, in _vote Module relstorage.relstorage, line 566, in _prepare_tid Module relstorage.adapters.mysql, line 506, in start_commit Module relstorage.adapters.mysql, line 672, in _hold_commit_lock StorageError: Unable to acquire commit lock I solve the problem restarting all instances, and the site became operational again, but I have some questions: This can be a bug or there is any problem in my enviroment/application? There is another solution to release commit lock without restart all instances? Cheers, -- Rudá Porto Filgueiras http://python-blog.blogspot.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
On Wed, August 12, 2009 22:23, Rudá Porto Filgueiras wrote: I begin to use RelStorage in a production site with Plone 2.5. Everything was running without failures since 01 august 2009. But today after a failure in tpc_abort, all instances conneceted to MySQL can't acquire commit lock. Follow tpc_abort traceback: 2009-08-12T14:12:08 ERROR txn.1115806016 Error in tpc_abort() on manager MultiObjectResourceAdapter for ZODB.DB.TransactionalUndo object at 0x210f6790 at 654293328 Traceback (most recent call last): File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 533, in _cleanup rm.tpc_abort(self) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 628, in tpc_abort self.manager.tpc_abort(txn) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/ZODB/BaseStorage.py, line 194, in tpc_abort self._abort() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 710, in _abort self._rollback_load_connection() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 166, in _rollback_load_connection self._load_conn.rollback() OperationalError: (2006, 'MySQL server has gone away') And after, all instances report this exeption: 2009-08-12T14:21:53 ERROR Zope.SiteErrorLog http://adm.agecom.ba.gov.br/login_form Traceback (innermost last): Module ZPublisher.Publish, line 121, in publish Module Zope2.App.startup, line 240, in commit Module transaction._manager, line 96, in commit Module transaction._transaction, line 395, in commit Module transaction._transaction, line 498, in _commitResources Module ZODB.Connection, line 730, in tpc_vote Module relstorage.relstorage, line 675, in tpc_vote Module relstorage.relstorage, line 659, in _vote Module relstorage.relstorage, line 566, in _prepare_tid Module relstorage.adapters.mysql, line 506, in start_commit Module relstorage.adapters.mysql, line 672, in _hold_commit_lock StorageError: Unable to acquire commit lock I solve the problem restarting all instances, and the site became operational again, but I have some questions: This can be a bug or there is any problem in my enviroment/application? There is another solution to release commit lock without restart all instances? Cheers, -- Rudá Porto Filgueiras http://python-blog.blogspot.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev i just came back from reading relstorage code (research work for radosstorage) and the lock is actually held on the mysql server. my guess is that the connection drop you experience earlier left the lock in place on the mysql-server. obviously the mysql-server did not notice you connection dying, otherwise it would have released the lock,see http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock probably restarting mysql would have solved your issue?! regards, jürgen -- XLhost.de - eXperts in Linux hosting ® XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Volker Geith, Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 WEB: http://www.XLhost.de IRC: #xlh...@irc.quakenet.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
On Wed, Aug 12, 2009 at 5:35 PM, Jürgen Herrmannjuergen.herrm...@xlhost.de wrote: On Wed, August 12, 2009 22:23, Rudá Porto Filgueiras wrote: I begin to use RelStorage in a production site with Plone 2.5. Everything was running without failures since 01 august 2009. But today after a failure in tpc_abort, all instances conneceted to MySQL can't acquire commit lock. Follow tpc_abort traceback: 2009-08-12T14:12:08 ERROR txn.1115806016 Error in tpc_abort() on manager MultiObjectResourceAdapter for ZODB.DB.TransactionalUndo object at 0x210f6790 at 654293328 Traceback (most recent call last): File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 533, in _cleanup rm.tpc_abort(self) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/transaction/_transaction.py, line 628, in tpc_abort self.manager.tpc_abort(txn) File /usr/local/zope/agecom-virtual/eggs/ZODB3-3.7.3_polling-py2.4-linux-x86_64.egg/ZODB/BaseStorage.py, line 194, in tpc_abort self._abort() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 710, in _abort self._rollback_load_connection() File /usr/local/zope/agecom-virtual/eggs/RelStorage-1.2.0b2-py2.4.egg/relstorage/relstorage.py, line 166, in _rollback_load_connection self._load_conn.rollback() OperationalError: (2006, 'MySQL server has gone away') And after, all instances report this exeption: 2009-08-12T14:21:53 ERROR Zope.SiteErrorLog http://adm.agecom.ba.gov.br/login_form Traceback (innermost last): Module ZPublisher.Publish, line 121, in publish Module Zope2.App.startup, line 240, in commit Module transaction._manager, line 96, in commit Module transaction._transaction, line 395, in commit Module transaction._transaction, line 498, in _commitResources Module ZODB.Connection, line 730, in tpc_vote Module relstorage.relstorage, line 675, in tpc_vote Module relstorage.relstorage, line 659, in _vote Module relstorage.relstorage, line 566, in _prepare_tid Module relstorage.adapters.mysql, line 506, in start_commit Module relstorage.adapters.mysql, line 672, in _hold_commit_lock StorageError: Unable to acquire commit lock I solve the problem restarting all instances, and the site became operational again, but I have some questions: This can be a bug or there is any problem in my enviroment/application? There is another solution to release commit lock without restart all instances? Cheers, -- Rudá Porto Filgueiras http://python-blog.blogspot.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev i just came back from reading relstorage code (research work for radosstorage) and the lock is actually held on the mysql server. my guess is that the connection drop you experience earlier left the lock in place on the mysql-server. obviously the mysql-server did not notice you connection dying, otherwise it would have released the lock,see http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock But the MySQL is hosted in the same machine, it's not a network failure. :-( I supose this query can solve the problem too: SELECT RELEASE_LOCK(CONCAT(DATABASE(), '.commit')) probably restarting mysql would have solved your issue?! I think so, but I can't test it. :-( But the question remain, why the database connection was not safely closed when tcp_abort fail? I guess MySQL adapter.close raise a close_exception and it's not logged. But, it's not the first time this issue happen, I saw the same problem but it's not easy to reproduce. How to detect that a connection die in the midle of transaction and has left the commit lock *LOCKED*? If it can be detect and FLAGED, adapter.close or load_connection should take care to execute MySQL RELEASE_LOCK? regards, jürgen -- XLhost.de - eXperts in Linux hosting ® XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Volker Geith, Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 WEB: http://www.XLhost.de IRC: #xlh...@irc.quakenet.org -- Rudá Porto Filgueiras http://python-blog.blogspot.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
Rudá Porto Filgueiras wrote: (...) Module relstorage.adapters.mysql, line 506, in start_commit Module relstorage.adapters.mysql, line 672, in _hold_commit_lock StorageError: Unable to acquire commit lock I solve the problem restarting all instances, and the site became operational again, but I have some questions: This can be a bug or there is any problem in my enviroment/application? There is another solution to release commit lock without restart all instances? Perhaps some instance was taking a very long time to finish a transaction commit and you didn't notice it. RelStorage does everything it can to minimize the amount of time the commit lock is held (it uses a strategy similar to ZEO), but applications are ultimately in control of how long it takes to commit a transaction. A concurrent pack might provide another explanation. Have you ever packed the database? This could also indicate a bug in MySQL. According to the documentation of get_lock(), all locks will be released when connections terminate, but maybe you ran into a MySQL bug that causes locks to stick around. To me, the most plausible explanation is a MySQL bug, since the other hypotheses don't explain why one of the connections terminated prematurely. If I were you, I would try the same application with PostgreSQL instead of MySQL. If the bug persists, then at least we know it's not a MySQL bug. :-) But the question remain, why the database connection was not safely closed when tcp_abort fail? The error message occurring in tpc_abort was OperationalError: (2006, 'MySQL server has gone away'), suggesting that the database connection was *already closed*. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
On Wed, Aug 12, 2009 at 8:20 PM, Shane Hathawaysh...@hathawaymix.org wrote: Rudá Porto Filgueiras wrote: (...) Module relstorage.adapters.mysql, line 506, in start_commit Module relstorage.adapters.mysql, line 672, in _hold_commit_lock StorageError: Unable to acquire commit lock I solve the problem restarting all instances, and the site became operational again, but I have some questions: This can be a bug or there is any problem in my enviroment/application? There is another solution to release commit lock without restart all instances? Perhaps some instance was taking a very long time to finish a transaction commit and you didn't notice it. RelStorage does everything it can to minimize the amount of time the commit lock is held (it uses a strategy similar to ZEO), but applications are ultimately in control of how long it takes to commit a transaction. A concurrent pack might provide another explanation. Have you ever packed the database? No, there is no pack operation running, only people using Plone to put content. This could also indicate a bug in MySQL. According to the documentation of get_lock(), all locks will be released when connections terminate, but maybe you ran into a MySQL bug that causes locks to stick around. Yeah, it sticks for ever. :-( Maybe some tunning on inactive connection time in MySQL should help? To me, the most plausible explanation is a MySQL bug, since the other hypotheses don't explain why one of the connections terminated prematurely. I suspect it's MySQL related, becouse it happen with Plone 2.5 and Plone 3.0, RelStorage 1.1.3 and 1.2b2. If I were you, I would try the same application with PostgreSQL instead of MySQL. If the bug persists, then at least we know it's not a MySQL bug. :-) I first try to use PostgreSQL, but the time to convert ZODB FileStorage was much larger than MySQL, then I decide to use MySQL. But the question remain, why the database connection was not safely closed when tcp_abort fail? The error message occurring in tpc_abort was OperationalError: (2006, 'MySQL server has gone away'), suggesting that the database connection was *already closed*. Yes, it was closed in the midle of transaction, but if you are sure there is nothing to do with RelStorage and there is no possible or workaround to the RelStoratge MySQL adapter to lead with MySQL bugs or resistent do to network failures and recover from this deadlock situation (or it will an ugly hack), I will move to PostgreSQL. I also began with MySQL 5.0 from CentOS 5.0 distribution and after a change to MySQL 5.4.1 and this failure occur on both. Cheers, Shane -- Rudá Porto Filgueiras http://python-blog.blogspot.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock
Rudá Porto Filgueiras wrote: Yes, it was closed in the midle of transaction, but if you are sure there is nothing to do with RelStorage and there is no possible or workaround to the RelStoratge MySQL adapter to lead with MySQL bugs or resistent do to network failures and recover from this deadlock situation (or it will an ugly hack), I will move to PostgreSQL. At this point I can't even think of an ugly hack that would solve this, since I can't understand the state MySQL is getting into. I assume that calling RELEASE_LOCK in MySQL has no effect. I also began with MySQL 5.0 from CentOS 5.0 distribution and after a change to MySQL 5.4.1 and this failure occur on both. Did you mean CentOS 5.2? If not, I would upgrade. :-) Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev