RESOLUTION: Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-15 Thread Alan Runyan

Thanks to Jim, Theune, Dieter and all others who weighed in on this thread.

The problem:  ZEO Clients would lock up randomly requiring restart.

The hints: Lots of 'Connection timed out' and 'No route to host' in
ZEO Server log files.

The solution: The machine the ZEO server was running iptables.  Even
though it was
allowing the zeo server port 8100; it was filtering on all interfaces.
I simply changed rules so port 8100 would not filter the internal
network interface - at all.  ZEO listens specifically on internal
ip/port 8100.

System has been stable for several days; without zeo server logs
containing any connection errors.  I had never had this problem before
because.. well.. we use firewalls on our customers (external
interfaces and our internal interfaces never have filtering) and this
particular sysadmin was running iptables on all public/private
interfaces; locking the machine down as much as possible.

Thanks again guys!

--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-12 Thread Dieter Maurer
Alan Runyan wrote at 2007-4-11 11:31 -0500:
 ... ZEO lockups ...

PeterZ <[EMAIL PROTECTED]> reported today very similar problems
in "[EMAIL PROTECTED]". He, too, gets:

>  File "/opt/zope/Python-2.4.3/lib/python2.4/asyncore.py", line 343, in
recv
>data = self.socket.recv(buffer_size)
>error: (113, 'No route to host')

Maybe, you have something in common (the same software version, hardware
part) which causes these problems?


Apart from that, I have seen 2 reasons for ZEO lockups:

  *  a firewall between the ZEO clients and the ZEO server
 which dropped connections without informing the connection
 endpoints

  *  ZEO clients that access the same storage (in the same ZEO)
 via two different connections (leads to a commit deadlock).


-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-12 Thread Jim Fulton


On Apr 12, 2007, at 1:17 PM, Alan Runyan wrote:


I find the 'No route to host' disturbing although
these have not happened over the past 24 hours.  This has:
2007-04-12T00:17:45 ERROR ZEO.zrpc.Connection(S)
(172.16.235.120:54881) Error caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69,  
in read

   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391,
in handle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in  
handle_read

   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in recv

   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')
--

which is frustrating.  as i understand zeoserver is taking too long to
communicate to zeoclient and zeoclient times out.


I don't know what the source of the "Connection timed out'" error is.  
The error at this point in the code is very puzzline.  There has been  
a select call and select returned indicating that the socket was  
ready to be read.  There should be no delay at all.  That's the whole  
point of using an asynchronous network library.



  shouldnt it retry /
reconnect?


The server is closing the connection with the client, which should  
cause the client to reconnect.


Do you see any log messages on the clients at times corresponding to  
these server log messages?


Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-12 Thread Alan Runyan

> Storage: 1
> Server started: Wed Apr 11 10:56:50 2007
> Clients: 10
> Clients verifying: 0
> Active transactions: -1

Huh? You're owing the system a transaction. However, by looking at the
code briefly, this might happen if tpc_abort() and _abort() kind of
overlap. And you did have two aborts at that point in time.


Sounds like a bug/race that needs to be looked into in ZEO.


Ah. The different clients might be because you have two storages and
your ZEO clients are configured in a way not to connect to the exactly
same storages? Or they are but they weren't able to.
(See hardware/network problems.)


They are both defined in zope.conf.

All 12 clients were restarted last night:

Just now I'm seeing:
Storage: 1
Server started: Wed Apr 11 10:56:50 2007
Clients: 12
Clients verifying: 0
Active transactions: -1
Commits: 92
Aborts: 2
Loads: 498120
Stores: 2279
Conflicts: 0
Conflicts resolved: 20

Storage: 2
Server started: Wed Apr 11 10:56:50 2007
Clients: 11
Clients verifying: 0
Active transactions: 0
Commits: 51
Aborts: 0
Loads: 225080
Stores: 6408
Conflicts: 0
Conflicts resolved: 167


Something that came to my mind that might block the ZEO server for a
long time are hard disk failures. Check your dmesg log. However, the
network errors you see in various places really need to be tracked down.


nothing in dmesg.  I find the 'No route to host' disturbing although
these have not happened over the past 24 hours.  This has:
2007-04-12T00:17:45 ERROR ZEO.zrpc.Connection(S)
(172.16.235.120:54881) Error caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read
   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391,
in handle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_read
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in recv
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')
--

which is frustrating.  as i understand zeoserver is taking too long to
communicate to zeoclient and zeoclient times out.  shouldnt it retry /
reconnect?

alan
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-11 Thread Christian Theune
Morning,

Am Mittwoch, den 11.04.2007, 11:31 -0500 schrieb Alan Runyan:
> ETHERNET CARD:
> 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708
> Gigabit Ethernet (rev 11)
> 
>   - I have turned down transaction timeout to 15 seconds.
> 
>   - I currently have 11 ZEO Clients up; but showing below (seems
> strange)  sometimes the clients are 6 or 7..
> 
>   - Have not confirmed
> ZEO MONITOR OUTPUT:
> 
> Trying 127.0.0.1...
> Connected to localhost.localdomain (127.0.0.1).
> Escape character is '^]'.
> ZEO monitor server version 3.6.2
> Wed Apr 11 12:30:31 2007
> 
> Storage: 1
> Server started: Wed Apr 11 10:56:50 2007
> Clients: 10
> Clients verifying: 0
> Active transactions: -1

Huh? You're owing the system a transaction. However, by looking at the
code briefly, this might happen if tpc_abort() and _abort() kind of
overlap. And you did have two aborts at that point in time.

> Commits: 24
> Aborts: 2
> Loads: 95941
> Stores: 523
> Conflicts: 0
> Conflicts resolved: 20
> 
> Storage: 2
> Server started: Wed Apr 11 10:56:50 2007
> Clients: 8
> Clients verifying: 0
> Active transactions: 0
> Commits: 9
> Aborts: 0
> Loads: 40836
> Stores: 714
> Conflicts: 0
> Conflicts resolved: 167

Ah. The different clients might be because you have two storages and
your ZEO clients are configured in a way not to connect to the exactly
same storages? Or they are but they weren't able to.
(See hardware/network problems.)

Something that came to my mind that might block the ZEO server for a
long time are hard disk failures. Check your dmesg log. However, the
network errors you see in various places really need to be tracked down.

Christian

-- 
gocept gmbh & co. kg - forsterstraße 29 - 06112 halle/saale - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-11 Thread Alan Runyan

ETHERNET CARD:
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708
Gigabit Ethernet (rev 11)

 - I have turned down transaction timeout to 15 seconds.

 - I currently have 11 ZEO Clients up; but showing below (seems
strange)  sometimes the clients are 6 or 7..

 - Have not confirmed
ZEO MONITOR OUTPUT:

Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
ZEO monitor server version 3.6.2
Wed Apr 11 12:30:31 2007

Storage: 1
Server started: Wed Apr 11 10:56:50 2007
Clients: 10
Clients verifying: 0
Active transactions: -1
Commits: 24
Aborts: 2
Loads: 95941
Stores: 523
Conflicts: 0
Conflicts resolved: 20

Storage: 2
Server started: Wed Apr 11 10:56:50 2007
Clients: 8
Clients verifying: 0
Active transactions: 0
Commits: 9
Aborts: 0
Loads: 40836
Stores: 714
Conflicts: 0
Conflicts resolved: 167



--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-11 Thread Stefan H. Holek
Are you, by chance, using Intel E1000 NICs? See "Fighting the  
hardware" in Jodok's write-up: http://www.lovelysystems.com/batlogg/ 
2007/03/30/the-decathlon-of-computer-science/


Stefan


On 10. Apr 2007, at 20:19, Alan Runyan wrote:


I am seeing something *very* strange in zeo.log:
2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S)  
(172.16.235.120:49351) Erro

r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69,  
in read

   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
391, in han

dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in  
handle_re

ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (113, 'No route to host')


but several more of these:
2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S)  
(172.16.235.119:44322) Erro

r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69,  
in read

   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
391, in han

dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in  
handle_re

ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')


--
Anything that, in happening, causes itself to happen again,
happens again.  --Douglas Adams


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Jens Vagelpohl

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 10 Apr 2007, at 20:19, Alan Runyan wrote:


 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (113, 'No route to host')



 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')


With errors like this I would concentrate on the network first  
(hardware as well as network settings/routes etc) and make 100% sure  
you don't have anything strange happening on that end.


jens


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFGHIWaRAx5nvEhZLIRAmnLAJwP2Jc28uDJy+i0DIfRPvZU8aW6rwCfa8Kz
GLobtRNCkuwTG2IskkNnOig=
=wOYW
-END PGP SIGNATURE-
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Russ Ferriday


On 10 Apr 2007, at 21:39, Chris Withers wrote:


I'd look at the switches and maybe even the nics and cables :-S
And if you are doing that, also check your network to make sure all  
IP addresses are unique.



(..but I'm not a sysadmin.)

And neither am I!

--r

Russ Ferriday - Topia Systems - Open Source content management with  
Plone and Zope
[EMAIL PROTECTED] - office: +44 2076 1777588 - mobile: +44 7789 338868  
- skype: ferriday

a member of
Zea Partners

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Chris Withers

Alan Runyan wrote:

>data = self.socket.recv(buffer_size)
> error: (113, 'No route to host')

That *is* very odd, anything other than pound being used for load
balancing or traffic shaping?


This has to be a major problem maker in the system.  Pound is simply
round robin connections to pool of Zope.


I'd look at the switches and maybe even the nics and cables :-S
(..but I'm not a sysadmin.)

>  File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, 
in rec

> v
>data = self.socket.recv(buffer_size)
> error: (110, 'Connection timed out')

Storage server too busy :-(


the storage server can be "too busy" when its only reading?


'corse - too many pickles getting sucked down...

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Alan Runyan

I'ev not had anything but bad experiences with pound myself, lvs seems a
much more preferable alternative...


We have not had such negative experiences with pound.


>data = self.socket.recv(buffer_size)
> error: (113, 'No route to host')

That *is* very odd, anything other than pound being used for load
balancing or traffic shaping?


This has to be a major problem maker in the system.  Pound is simply
round robin connections to pool of Zope.


>  File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec
> v
>data = self.socket.recv(buffer_size)
> error: (110, 'Connection timed out')

Storage server too busy :-(


the storage server can be "too busy" when its only reading?

cheers
alan
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Chris Withers

Alan Runyan wrote:

We have 10 ZEO clients that are for public consumption "READ ONLY".
We have a separate ZEO client that is writing that is on a separate box.


I'd put money on the client doing the writing causing problems.
That or client side cache thrash caused by zcatalog or similar ;-)


The ZEO Server was consistently at 2% CPU.


What was the network and disk i/o like?


getting through the cache and back to pound which was then load


I'ev not had anything but bad experiences with pound myself, lvs seems a 
much more preferable alternative...



The customer was posting content throughout the slashdot.  The problem
was that when the clients would update they would end up through 500s


not sure what that last bit means...


   data = self.socket.recv(buffer_size)
error: (113, 'No route to host')


That *is* very odd, anything other than pound being used for load 
balancing or traffic shaping?



 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec
v
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')


Storage server too busy :-(

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Jim Fulton


On Apr 10, 2007, at 2:19 PM, Alan Runyan wrote:
...

For Jim: We did not adjust the transaction timeout.  Would that have
helped in the case of READ's?


Possibly, I'm not sure and I don't have time now to dig.  It might be  
worth trying, however:



The customer was posting content throughout the slashdot.  The problem
was that when the clients would update they would end up through 500s

I am seeing something *very* strange in zeo.log:
2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S)  
(172.16.235.120:49351) Erro

r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69,  
in read

   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
391, in han

dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in  
handle_re

ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (113, 'No route to host')


but several more of these:
2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S)  
(172.16.235.119:44322) Erro

r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69,  
in read

   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
391, in han

dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in  
handle_re

ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line  
343, in rec

v
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')
--


Obviously, both of these are very suspicious.

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Benji York

Alan Runyan wrote:

The website got slashdoted this morning [...]



Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over
200 req/second.


Off topic: 200 requests a second seems a bit light for a slashdotting, 
any more details you can divulge there?

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Alan Runyan

How does data get into the ZEO storage then?


We have 10 ZEO clients that are for public consumption "READ ONLY".
We have a separate ZEO client that is writing that is on a separate box.

The website got slashdoted this morning and we had 4 zeo clients go
out.  Basically waiting for the zeo server for many minutes (basically
hung and reporting 500s back to browsers).

4/10/07 - delta ah2 lock up - http://paste.plone.org/13919
4/10/07 - epsilon ah2 lock up - http://paste.plone.org/13920
4/10/07 - delta ah1 lock up - http://paste.plone.org/13936
4/10/07 - epsilon ah3 lock up - http://paste.plone.org/13935

Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over
200 req/second.

The ZEO Server was consistently at 2% CPU.  Lots of traffic was
getting through the cache and back to pound which was then load
balancing to the ZEO clients.

For Jim: We did not adjust the transaction timeout.  Would that have
helped in the case of READ's?

The customer was posting content throughout the slashdot.  The problem
was that when the clients would update they would end up through 500s

I am seeing something *very* strange in zeo.log:
2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro
r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read
   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han
dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re
ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec
v
   data = self.socket.recv(buffer_size)
error: (113, 'No route to host')


but several more of these:
2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro
r caught in asyncore
Traceback (most recent call last):
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read
   obj.handle_read_event()
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han
dle_read_event
   self.handle_read()
 File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re
ad
   d = self.recv(8192)
 File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec
v
   data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')
--
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-10 Thread Chris Withers

Alan Runyan wrote:

Do you have anything that is committing very large transactions?


No. In fact; these clients could be running in read only mode.  As far
as I'm concerned.


How does data get into the ZEO storage then?

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-09 Thread Alan Runyan

> Try setting the transaction timeout.  Say, maybe, 10 seconds.

Do you have anything that is committing very large transactions?


No. In fact; these clients could be running in read only mode.  As far
as I'm concerned.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-09 Thread Jim Fulton


On Apr 9, 2007, at 5:05 PM, Jim Fulton wrote:



On Apr 9, 2007, at 4:50 PM, Alan Runyan wrote:

Hi Jim, yes, still having lockups.  We have a list of  
deadlockdebugger

info associated with each lockup.

The client configuration as at the beginning of this thread.  The zeo
server configuration is as follows:


# ZEO configuration file

%define INSTANCE /usr/local/zeo-ah


 address 8100
 read-only false
 invalidation-queue-size 100
 # pid-filename $INSTANCE/var/ZEO.pid
 # monitor-address PORT
 # transaction-timeout SECONDS



Try setting the transaction timeout.  Say, maybe, 10 seconds.


Do you have anything that is committing very large transactions?

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-09 Thread Jim Fulton


On Apr 9, 2007, at 4:50 PM, Alan Runyan wrote:


Hi Jim, yes, still having lockups.  We have a list of deadlockdebugger
info associated with each lockup.

The client configuration as at the beginning of this thread.  The zeo
server configuration is as follows:


# ZEO configuration file

%define INSTANCE /usr/local/zeo-ah


 address 8100
 read-only false
 invalidation-queue-size 100
 # pid-filename $INSTANCE/var/ZEO.pid
 # monitor-address PORT
 # transaction-timeout SECONDS



Try setting the transaction timeout.  Say, maybe, 10 seconds.

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-09 Thread Alan Runyan

Hi Jim, yes, still having lockups.  We have a list of deadlockdebugger
info associated with each lockup.

The client configuration as at the beginning of this thread.  The zeo
server configuration is as follows:


# ZEO configuration file

%define INSTANCE /usr/local/zeo-ah


 address 8100
 read-only false
 invalidation-queue-size 100
 # pid-filename $INSTANCE/var/ZEO.pid
 # monitor-address PORT
 # transaction-timeout SECONDS



 path $INSTANCE/var/Data.fs


 path $INSTANCE/var/Catalog.fs



 level info
 
   path $INSTANCE/log/zeo.log
 



 program $INSTANCE/bin/runzeo
 socket-name $INSTANCE/etc/zeo.zdsock
 daemon true
 forever false
 backoff-limit 10
 exit-codes 0, 2
 directory $INSTANCE
 default-to-interactive true
 # user zope
 python /usr/local/python/bin/python
 zdrun /usr/local/zope-2.9.6/lib/python/zdaemon/zdrun.py

 # This logfile should match the one in the zeo.conf file.
 # It is used by zdctl's logtail command, zdrun/zdctl doesn't write it.
 logfile $INSTANCE/log/zeo.log


any insight would be appreciated.

cheers
alan
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-09 Thread Jim Fulton


On Apr 3, 2007, at 4:13 PM, Alan Runyan wrote:


Hi guys.

Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4

Having lots of lockups.


Still?

...


typical client zeo configuration:


May We see the ZEO config?

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-03 Thread Christian Theune
Hi,

Am Dienstag, den 03.04.2007, 15:13 -0500 schrieb Alan Runyan:
> Hi guys.
> 
> Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4
> 
> Having lots of lockups.  Have approximately 12 zeo clients on 2
> machines connecting to a single zeo server. All on local network.
> Disks and network is monitored by competent hosting company.  All
> looks healthy except zeo communication.
> 
> Linux  2.6.9-42.0.8.ELsmp
> 
> some feedback from manage_debug_threads:
> 
> http://paste.plone.org/13821
> http://paste.plone.org/13822

Those two are currently loading an object in thread 2. Thread 1 tries to
load an object as well, however, ClientStorage only allows to perform
one load on each client at a time, so the second thread is sitting there
waiting for the load of the first thread to finish.

> http://paste.plone.org/13823

This one looks healthy. One thread is loading from the ZEO server, the
other doesn't have anything to do.

Can you figure out whether those loads ever finish? Do you see any
network traffic on the ZEO port from the server (i.e. is data
transferred)?

Christian

-- 
gocept gmbh & co. kg - forsterstraße 29 - 06112 halle/saale - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

2007-04-03 Thread Alan Runyan

Hi guys.

Running Zope 2.9.6 with ZODB 3.6.2 on Python 2.4

Having lots of lockups.  Have approximately 12 zeo clients on 2
machines connecting to a single zeo server. All on local network.
Disks and network is monitored by competent hosting company.  All
looks healthy except zeo communication.

Linux  2.6.9-42.0.8.ELsmp

some feedback from manage_debug_threads:

http://paste.plone.org/13821
http://paste.plone.org/13822
http://paste.plone.org/13823

typical client zeo configuration:
/var/zope/ah0/zope.conf (included zope.conf)
%define HTTP_SERVER 7580
%define ICP_SERVER 7680
%define INSTANCE /var/zope/ah0
%define MAIN_NAME epsilon0
%define CATALOG_NAME epsilonA
%include /var/zope/etc/zope.conf

/var/zope/etc/zope.conf (main zope.conf)

 mount-point /
 # ZODB cache, in number of objects
 cache-size 2
 
   server gamma-gw.audioholics.com:8100
   storage 1
   name $MAIN_NAME
   var $INSTANCE/var
   # ZEO client cache, in bytes
   cache-size 1024MB
   # Uncomment to have a persistent disk cache
   #client zeo1
 



  mount-point /audioholics/portal_catalog
  container-class Products.CMFPlone.CatalogTool.CatalogTool
  cache-size 5
  
   cache-size 1024MB
   server gamma-gw.audioholics.com:8100
   storage 2
   name $CATALOG_NAME
   var $INSTANCE/var
 


any ideas?

alan
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev