RESOLUTION: Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Thanks to Jim, Theune, Dieter and all others who weighed in on this thread. The problem: ZEO Clients would lock up randomly requiring restart. The hints: Lots of 'Connection timed out' and 'No route to host' in ZEO Server log files. The solution: The machine the ZEO server was running iptables. Even though it was allowing the zeo server port 8100; it was filtering on all interfaces. I simply changed rules so port 8100 would not filter the internal network interface - at all. ZEO listens specifically on internal ip/port 8100. System has been stable for several days; without zeo server logs containing any connection errors. I had never had this problem before because.. well.. we use firewalls on our customers (external interfaces and our internal interfaces never have filtering) and this particular sysadmin was running iptables on all public/private interfaces; locking the machine down as much as possible. Thanks again guys! -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote at 2007-4-11 11:31 -0500: ... ZEO lockups ... PeterZ <[EMAIL PROTECTED]> reported today very similar problems in "[EMAIL PROTECTED]". He, too, gets: > File "/opt/zope/Python-2.4.3/lib/python2.4/asyncore.py", line 343, in recv >data = self.socket.recv(buffer_size) >error: (113, 'No route to host') Maybe, you have something in common (the same software version, hardware part) which causes these problems? Apart from that, I have seen 2 reasons for ZEO lockups: * a firewall between the ZEO clients and the ZEO server which dropped connections without informing the connection endpoints * ZEO clients that access the same storage (in the same ZEO) via two different connections (leads to a commit deadlock). -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 12, 2007, at 1:17 PM, Alan Runyan wrote: I find the 'No route to host' disturbing although these have not happened over the past 24 hours. This has: 2007-04-12T00:17:45 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:54881) Error caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in handle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_read d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in recv data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- which is frustrating. as i understand zeoserver is taking too long to communicate to zeoclient and zeoclient times out. I don't know what the source of the "Connection timed out'" error is. The error at this point in the code is very puzzline. There has been a select call and select returned indicating that the socket was ready to be read. There should be no delay at all. That's the whole point of using an asynchronous network library. shouldnt it retry / reconnect? The server is closing the connection with the client, which should cause the client to reconnect. Do you see any log messages on the clients at times corresponding to these server log messages? Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
> Storage: 1 > Server started: Wed Apr 11 10:56:50 2007 > Clients: 10 > Clients verifying: 0 > Active transactions: -1 Huh? You're owing the system a transaction. However, by looking at the code briefly, this might happen if tpc_abort() and _abort() kind of overlap. And you did have two aborts at that point in time. Sounds like a bug/race that needs to be looked into in ZEO. Ah. The different clients might be because you have two storages and your ZEO clients are configured in a way not to connect to the exactly same storages? Or they are but they weren't able to. (See hardware/network problems.) They are both defined in zope.conf. All 12 clients were restarted last night: Just now I'm seeing: Storage: 1 Server started: Wed Apr 11 10:56:50 2007 Clients: 12 Clients verifying: 0 Active transactions: -1 Commits: 92 Aborts: 2 Loads: 498120 Stores: 2279 Conflicts: 0 Conflicts resolved: 20 Storage: 2 Server started: Wed Apr 11 10:56:50 2007 Clients: 11 Clients verifying: 0 Active transactions: 0 Commits: 51 Aborts: 0 Loads: 225080 Stores: 6408 Conflicts: 0 Conflicts resolved: 167 Something that came to my mind that might block the ZEO server for a long time are hard disk failures. Check your dmesg log. However, the network errors you see in various places really need to be tracked down. nothing in dmesg. I find the 'No route to host' disturbing although these have not happened over the past 24 hours. This has: 2007-04-12T00:17:45 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:54881) Error caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in handle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_read d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in recv data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- which is frustrating. as i understand zeoserver is taking too long to communicate to zeoclient and zeoclient times out. shouldnt it retry / reconnect? alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Morning, Am Mittwoch, den 11.04.2007, 11:31 -0500 schrieb Alan Runyan: > ETHERNET CARD: > 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 > Gigabit Ethernet (rev 11) > > - I have turned down transaction timeout to 15 seconds. > > - I currently have 11 ZEO Clients up; but showing below (seems > strange) sometimes the clients are 6 or 7.. > > - Have not confirmed > ZEO MONITOR OUTPUT: > > Trying 127.0.0.1... > Connected to localhost.localdomain (127.0.0.1). > Escape character is '^]'. > ZEO monitor server version 3.6.2 > Wed Apr 11 12:30:31 2007 > > Storage: 1 > Server started: Wed Apr 11 10:56:50 2007 > Clients: 10 > Clients verifying: 0 > Active transactions: -1 Huh? You're owing the system a transaction. However, by looking at the code briefly, this might happen if tpc_abort() and _abort() kind of overlap. And you did have two aborts at that point in time. > Commits: 24 > Aborts: 2 > Loads: 95941 > Stores: 523 > Conflicts: 0 > Conflicts resolved: 20 > > Storage: 2 > Server started: Wed Apr 11 10:56:50 2007 > Clients: 8 > Clients verifying: 0 > Active transactions: 0 > Commits: 9 > Aborts: 0 > Loads: 40836 > Stores: 714 > Conflicts: 0 > Conflicts resolved: 167 Ah. The different clients might be because you have two storages and your ZEO clients are configured in a way not to connect to the exactly same storages? Or they are but they weren't able to. (See hardware/network problems.) Something that came to my mind that might block the ZEO server for a long time are hard disk failures. Check your dmesg log. However, the network errors you see in various places really need to be tracked down. Christian -- gocept gmbh & co. kg - forsterstraße 29 - 06112 halle/saale - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development signature.asc Description: Dies ist ein digital signierter Nachrichtenteil ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
ETHERNET CARD: 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) - I have turned down transaction timeout to 15 seconds. - I currently have 11 ZEO Clients up; but showing below (seems strange) sometimes the clients are 6 or 7.. - Have not confirmed ZEO MONITOR OUTPUT: Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. ZEO monitor server version 3.6.2 Wed Apr 11 12:30:31 2007 Storage: 1 Server started: Wed Apr 11 10:56:50 2007 Clients: 10 Clients verifying: 0 Active transactions: -1 Commits: 24 Aborts: 2 Loads: 95941 Stores: 523 Conflicts: 0 Conflicts resolved: 20 Storage: 2 Server started: Wed Apr 11 10:56:50 2007 Clients: 8 Clients verifying: 0 Active transactions: 0 Commits: 9 Aborts: 0 Loads: 40836 Stores: 714 Conflicts: 0 Conflicts resolved: 167 -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Are you, by chance, using Intel E1000 NICs? See "Fighting the hardware" in Jodok's write-up: http://www.lovelysystems.com/batlogg/ 2007/03/30/the-decathlon-of-computer-science/ Stefan On 10. Apr 2007, at 20:19, Alan Runyan wrote: I am seeing something *very* strange in zeo.log: 2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') but several more of these: 2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- Anything that, in happening, causes itself to happen again, happens again. --Douglas Adams ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10 Apr 2007, at 20:19, Alan Runyan wrote: File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') With errors like this I would concentrate on the network first (hardware as well as network settings/routes etc) and make 100% sure you don't have anything strange happening on that end. jens -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iD8DBQFGHIWaRAx5nvEhZLIRAmnLAJwP2Jc28uDJy+i0DIfRPvZU8aW6rwCfa8Kz GLobtRNCkuwTG2IskkNnOig= =wOYW -END PGP SIGNATURE- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On 10 Apr 2007, at 21:39, Chris Withers wrote: I'd look at the switches and maybe even the nics and cables :-S And if you are doing that, also check your network to make sure all IP addresses are unique. (..but I'm not a sysadmin.) And neither am I! --r Russ Ferriday - Topia Systems - Open Source content management with Plone and Zope [EMAIL PROTECTED] - office: +44 2076 1777588 - mobile: +44 7789 338868 - skype: ferriday a member of Zea Partners ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: >data = self.socket.recv(buffer_size) > error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? This has to be a major problem maker in the system. Pound is simply round robin connections to pool of Zope. I'd look at the switches and maybe even the nics and cables :-S (..but I'm not a sysadmin.) > File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec > v >data = self.socket.recv(buffer_size) > error: (110, 'Connection timed out') Storage server too busy :-( the storage server can be "too busy" when its only reading? 'corse - too many pickles getting sucked down... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
I'ev not had anything but bad experiences with pound myself, lvs seems a much more preferable alternative... We have not had such negative experiences with pound. >data = self.socket.recv(buffer_size) > error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? This has to be a major problem maker in the system. Pound is simply round robin connections to pool of Zope. > File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec > v >data = self.socket.recv(buffer_size) > error: (110, 'Connection timed out') Storage server too busy :-( the storage server can be "too busy" when its only reading? cheers alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: We have 10 ZEO clients that are for public consumption "READ ONLY". We have a separate ZEO client that is writing that is on a separate box. I'd put money on the client doing the writing causing problems. That or client side cache thrash caused by zcatalog or similar ;-) The ZEO Server was consistently at 2% CPU. What was the network and disk i/o like? getting through the cache and back to pound which was then load I'ev not had anything but bad experiences with pound myself, lvs seems a much more preferable alternative... The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s not sure what that last bit means... data = self.socket.recv(buffer_size) error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') Storage server too busy :-( cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 10, 2007, at 2:19 PM, Alan Runyan wrote: ... For Jim: We did not adjust the transaction timeout. Would that have helped in the case of READ's? Possibly, I'm not sure and I don't have time now to dig. It might be worth trying, however: The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s I am seeing something *very* strange in zeo.log: 2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') but several more of these: 2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- Obviously, both of these are very suspicious. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: The website got slashdoted this morning [...] Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over 200 req/second. Off topic: 200 requests a second seems a bit light for a slashdotting, any more details you can divulge there? -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
How does data get into the ZEO storage then? We have 10 ZEO clients that are for public consumption "READ ONLY". We have a separate ZEO client that is writing that is on a separate box. The website got slashdoted this morning and we had 4 zeo clients go out. Basically waiting for the zeo server for many minutes (basically hung and reporting 500s back to browsers). 4/10/07 - delta ah2 lock up - http://paste.plone.org/13919 4/10/07 - epsilon ah2 lock up - http://paste.plone.org/13920 4/10/07 - delta ah1 lock up - http://paste.plone.org/13936 4/10/07 - epsilon ah3 lock up - http://paste.plone.org/13935 Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over 200 req/second. The ZEO Server was consistently at 2% CPU. Lots of traffic was getting through the cache and back to pound which was then load balancing to the ZEO clients. For Jim: We did not adjust the transaction timeout. Would that have helped in the case of READ's? The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s I am seeing something *very* strange in zeo.log: 2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') but several more of these: 2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: Do you have anything that is committing very large transactions? No. In fact; these clients could be running in read only mode. As far as I'm concerned. How does data get into the ZEO storage then? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
> Try setting the transaction timeout. Say, maybe, 10 seconds. Do you have anything that is committing very large transactions? No. In fact; these clients could be running in read only mode. As far as I'm concerned. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 9, 2007, at 5:05 PM, Jim Fulton wrote: On Apr 9, 2007, at 4:50 PM, Alan Runyan wrote: Hi Jim, yes, still having lockups. We have a list of deadlockdebugger info associated with each lockup. The client configuration as at the beginning of this thread. The zeo server configuration is as follows: # ZEO configuration file %define INSTANCE /usr/local/zeo-ah address 8100 read-only false invalidation-queue-size 100 # pid-filename $INSTANCE/var/ZEO.pid # monitor-address PORT # transaction-timeout SECONDS Try setting the transaction timeout. Say, maybe, 10 seconds. Do you have anything that is committing very large transactions? Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 9, 2007, at 4:50 PM, Alan Runyan wrote: Hi Jim, yes, still having lockups. We have a list of deadlockdebugger info associated with each lockup. The client configuration as at the beginning of this thread. The zeo server configuration is as follows: # ZEO configuration file %define INSTANCE /usr/local/zeo-ah address 8100 read-only false invalidation-queue-size 100 # pid-filename $INSTANCE/var/ZEO.pid # monitor-address PORT # transaction-timeout SECONDS Try setting the transaction timeout. Say, maybe, 10 seconds. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Hi Jim, yes, still having lockups. We have a list of deadlockdebugger info associated with each lockup. The client configuration as at the beginning of this thread. The zeo server configuration is as follows: # ZEO configuration file %define INSTANCE /usr/local/zeo-ah address 8100 read-only false invalidation-queue-size 100 # pid-filename $INSTANCE/var/ZEO.pid # monitor-address PORT # transaction-timeout SECONDS path $INSTANCE/var/Data.fs path $INSTANCE/var/Catalog.fs level info path $INSTANCE/log/zeo.log program $INSTANCE/bin/runzeo socket-name $INSTANCE/etc/zeo.zdsock daemon true forever false backoff-limit 10 exit-codes 0, 2 directory $INSTANCE default-to-interactive true # user zope python /usr/local/python/bin/python zdrun /usr/local/zope-2.9.6/lib/python/zdaemon/zdrun.py # This logfile should match the one in the zeo.conf file. # It is used by zdctl's logtail command, zdrun/zdctl doesn't write it. logfile $INSTANCE/log/zeo.log any insight would be appreciated. cheers alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 3, 2007, at 4:13 PM, Alan Runyan wrote: Hi guys. Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4 Having lots of lockups. Still? ... typical client zeo configuration: May We see the ZEO config? Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Hi, Am Dienstag, den 03.04.2007, 15:13 -0500 schrieb Alan Runyan: > Hi guys. > > Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4 > > Having lots of lockups. Have approximately 12 zeo clients on 2 > machines connecting to a single zeo server. All on local network. > Disks and network is monitored by competent hosting company. All > looks healthy except zeo communication. > > Linux 2.6.9-42.0.8.ELsmp > > some feedback from manage_debug_threads: > > http://paste.plone.org/13821 > http://paste.plone.org/13822 Those two are currently loading an object in thread 2. Thread 1 tries to load an object as well, however, ClientStorage only allows to perform one load on each client at a time, so the second thread is sitting there waiting for the load of the first thread to finish. > http://paste.plone.org/13823 This one looks healthy. One thread is loading from the ZEO server, the other doesn't have anything to do. Can you figure out whether those loads ever finish? Do you see any network traffic on the ZEO port from the server (i.e. is data transferred)? Christian -- gocept gmbh & co. kg - forsterstraße 29 - 06112 halle/saale - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development signature.asc Description: Dies ist ein digital signierter Nachrichtenteil ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Hi guys. Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4 Having lots of lockups. Have approximately 12 zeo clients on 2 machines connecting to a single zeo server. All on local network. Disks and network is monitored by competent hosting company. All looks healthy except zeo communication. Linux 2.6.9-42.0.8.ELsmp some feedback from manage_debug_threads: http://paste.plone.org/13821 http://paste.plone.org/13822 http://paste.plone.org/13823 typical client zeo configuration: /var/zope/ah0/zope.conf (included zope.conf) %define HTTP_SERVER 7580 %define ICP_SERVER 7680 %define INSTANCE /var/zope/ah0 %define MAIN_NAME epsilon0 %define CATALOG_NAME epsilonA %include /var/zope/etc/zope.conf /var/zope/etc/zope.conf (main zope.conf) mount-point / # ZODB cache, in number of objects cache-size 2 server gamma-gw.audioholics.com:8100 storage 1 name $MAIN_NAME var $INSTANCE/var # ZEO client cache, in bytes cache-size 1024MB # Uncomment to have a persistent disk cache #client zeo1 mount-point /audioholics/portal_catalog container-class Products.CMFPlone.CatalogTool.CatalogTool cache-size 5 cache-size 1024MB server gamma-gw.audioholics.com:8100 storage 2 name $CATALOG_NAME var $INSTANCE/var any ideas? alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev