FuBuJo wrote at 2008-3-14 13:31 +0000:
You need to be a bit more careful in your description.
For example the diagram "Apache -> Zeo -> Zope(ZODB)" is
very confusing. It is very rare that Apache speaks to Zeo.
The confusion between Zope and Zeo may go straight through your
description such that it is often unclear whether you really
mean Zope when you write Zope and Zeo when you write Zeo.
>The traffic is heavy write traffic (I read some of Dieters posts and am testing
>that out as well). Once overall load hits about 100 people or so the Zeo's
Here again, you use a wrong word: "dying" would mean that your ZEO
process terminates but below to say that it gets slower.
>- heavy load, slow response, python takes all CPU/Memory.
Which "python"? The "python" executing Zeo? Or the one executing Zope?
> Then when
>traffic is removed from the ZEO instance ... the system remains CPU bound by
>python process ... and you have to bounce Zope(Zeo instance) and Apache to free
Which system? The one running ZEO (the ZEO server) or the one running
>The ZODB reports heavy Clients waiting ... but doesn't budge on load.
You see this in the ZEO logfile?
Then, it is ZEO which reports the waiting -- not the ZODB.
>So ... anyone have any suggestions.
We are having similar problems -- I call them commit congestions.
As far as we understand it by now, it is a multiple cause problem.
Commit congestions can be caused on the client (=Zope) side and on the
server (=ZEO) side.
A client drastically increases the probability for commit congestions
when he does expensive things while he helds the commit lock, i.e.
during the second phase of the two phase commit protocol.
We have identified three causes:
* garbage collections
During a garbage collection the garbage collector holds
the GIL and blocks all Python activity.
We found that a single generation 2 (i.e. full) garbage
collection can take between 10 and 20 s.
We had a bad text index implementation
that caused excessive object creation and thereby lots
of garbage collections.
Our measure has been to drop the bad index implementation
and reconfigure the garbage collector to reduce the
garbage collection frequency by a factor of 1000
* "stat"s in the second commit phase.
In our system, "stat"s for NFS served files could take up to
27 s. It is a complete mystery why. Local IO, too, occasionally
seemed to need excessive time. This, too, is still mysterious.
We may have some hints: some ranking bugs in a search engine
could cause millions of IO operations within a short timeframe
and may have significantly affected the Linux IO behaviour.
* invalidation message reception and correspondng client cache updates
during the second commit phase
Other causes for commit contention come from the (Zeo) server:
* "FileStorage.pack" unnecessarily holds the commit lock
during large periods of the copying phase, drastically
increasing the probability for commit contentions
* during some pack phase (reachability analysis),
access to the storage file is high volume and erratic.
This drastically reduces the performance of the storage
and make commit contentions likely.
* other heavy use of the file system can affect the IO performance
available for storage access and can increase the
likelyhood for commit contentions.
>I can throw 10 more Apache/Zeo instances as it - but not sure if that's the
It is not. Commit contention is a synchronization problem.
It does not go away but is likely to increase when you scale
your frontends up.
>So I guess here's my questions.
>1. Is there a Zeo Client limit you can have when connecting to a Zope(Zeo
There is no limit in principle -- but as you can see,
lots of clients can affect performance.
Invalidation message processing poses a load on the server
which grows linearly with the number of clients (each client
must get all invalidations).
Most other Zeo load contributions are more dependent on the actual
number of requested operations (reads, writes, commits)
and less on the number of clients that request these operation
(of cause, more clients can generated more requests).
>2. Are there any special setting to allow for 'many' Zeo
>clients connecting to Zeo server?
Reconfigure the Python garbage collector such that it runs far
Get rid of components that (unnecessarily) create lots of Python objects.
Check whether you do unnecessary operations during the second commit
Place your ZODB storage files intelligently in the file system
such that other high volume IO operations do not badly affect
IO on the storage.
Zope maillist - Zope@zope.org
** No cross posts or HTML encoding! **
(Related lists -