Re: [Zope] Scaling problems, or something else?

2007-05-22 Thread Chris Withers

Dieter Maurer wrote:

I had to enhance ZEO to report which transactions are blocking
in order to quickly resolve problems.


It would be nice if you could feed those enhancements back into the 
source tree...


cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-21 Thread Dieter Maurer
Gerhard Schmidt wrote at 2007-5-21 08:42 +0200:
> ...
>We have expirienced some problem with ZEO when requests take to long. 
>
>Sometimes we have Problems like this 
>2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.29:52569) 
>Transaction blocked waiting for storage. Clients waiting: 1.
>--
>2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.36:56041) 
>Transaction blocked waiting for storage. Clients waiting: 2.
> ...

What you see here are informations over commit lock contention:

  Between the "vote" (start of) and the "finish" (end of)
  (second and third part of the commit protocol) 
  a "FileStorage" must hold a lock to prevent modifications
  of the voted state.
  When another connection tries to commit during this time,
  ZEO will report a "transaction blocked ...".


I had to enhance ZEO to report which transactions are blocking
in order to quickly resolve problems.

> ...
>This incident wasn't a Problem because it was resolved within on second.
>But sometimes situations like this take up tum 30 seconds to resolve. 
>The site is completly unresponsiv in this time and take up to 10 minutes
>to resume normal opration (Responsetimes < 1 sec per dynamic page) 

That's strange. I have never seen this (though commit lock contention
is not unusual in a site busy writing (as ours)).

It should not have any lasting effects

> 
>It seams there is a posibility for an deadlock when requests take to much
>time to process. 

Never observed something like this -- but a colleague already
had committed a monster transaction -- which committed for ages
That prompted me to add the more detailed information to the
"Transaction blocked" log entry.

>But the main Problem what we have is the memory growth of the Zope server 
>processes. They grow to 500 MB of Memory bevor serving the first request. 

That's strange, too.

You may take a look at my "analyseObjects"

  

It was developped to help in the determination of memory leaks but
it can be useful to analyse unreasonably high memory loads after startup
as well.


It has several drawbacks, however:

  *  It knows only about the garbage collector registered objects.

 A Python debug build is necessary to learn about all
 Python objects

  *  It does not know how much size the objects use.

 An integration with "PySizer" may improve on this.

>an wile running the constantly growing until the hit the limit of the
>physical memory. When they do, they slow down very dramaticaly
>(responstimes 800% higher than usual) we have done some debugging and it 
>seams that die python garbage collection kicks in and kills the whole
>performance.

The garbage collector by itself does not know about the physical
memory limit -- but when it kicks in and large parts of your objects
have been swapped out, its traversal (to determine unreachable objects)
will load them and this can drastically slow down the process.



-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-21 Thread Gerhard Schmidt
On Wed, May 16, 2007 at 05:52:16PM -0400, Paul Winkler wrote:
> On Wed, May 16, 2007 at 04:24:56PM -0500, Jens Vagelpohl wrote:
> > There's a difference between scaling and making something faster. ZEO  
> > makes a single instance slower, right. But you can deal with more  
> > requests concurrently using ZEO. That's what I consider "scaling".
> 
> Agreed.  But at the same time, I don't think it makes sense to keep
> deploying more and more badly tuned instances. That's what I consider
> "blind shotgun scaling" :) You need to scale, but you also need to
> tune - and you need to be pragmatic about which is the appropriate
> approach at any given point in time.

I would love to tune our system, but not using ZEO isn't an option. I don't
see how a single Server ever will be able to do what 13 Server do right now.

Our system is a single Instance. With up to 70k requests per hour (20
requests per second). I don't see a way to get rid of the ZEO server. 

We have expirienced some problem with ZEO when requests take to long. 

Sometimes we have Problems like this 
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.29:52569) 
Transaction blocked waiting for storage. Clients waiting: 1.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.36:56041) 
Transaction blocked waiting for storage. Clients waiting: 2.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.33:63884) 
Transaction blocked waiting for storage. Clients waiting: 3.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.34:64355) 
Transaction blocked waiting for storage. Clients waiting: 4.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.25:63215) 
Transaction blocked waiting for storage. Clients waiting: 5.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.28:58213) 
Transaction blocked waiting for storage. Clients waiting: 6.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.30:59149) 
Transaction blocked waiting for storage. Clients waiting: 7.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.31:58930) 
Transaction blocked waiting for storage. Clients waiting: 8.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.35:64097) 
Transaction blocked waiting for storage. Clients waiting: 9.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.27:63627) 
Transaction blocked waiting for storage. Clients waiting: 10.
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.26:63146) Blocked 
transaction restarted.  Clients waiting: 9
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.29:52569) Blocked 
transaction restarted.  Clients waiting: 8
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.36:56041) Blocked 
transaction restarted.  Clients waiting: 7
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.33:63884) Blocked 
transaction restarted.  Clients waiting: 6
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.34:64355) Blocked 
transaction restarted.  Clients waiting: 5
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.25:63215) Blocked 
transaction restarted.  Clients waiting: 4
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.28:58213) Blocked 
transaction restarted.  Clients waiting: 3
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.30:59149) Blocked 
transaction restarted.  Clients waiting: 2
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.31:58930) Blocked 
transaction restarted.  Clients waiting: 1
--
2007-05-21T03:36:22 INFO ZEO.StorageServer (56016/10.152.64.35:64097) Blocked 
transaction restarted.

This incident wasn't a Problem because it was resolved within on second.
But sometimes situations like this take up tum 30 seconds to resolve. 
The site is completly unresponsiv in this time and take up to 10 minutes
to resume normal opration (Responsetimes < 1 sec per dynamic page) 

I haven't been able to track down the Problem that causes this. But the 
frequency has droped quite dramatic since we updated our fontend Servers
to more recent CPUs. 

It seams there is a posibility for an deadlock when requests take to much
time to process. 

But the main Problem what we have is the memory growth of the Zope server 
processes. They grow to 500 MB of Memory bevor serving the first request. 
an wile running the constantly growing until the hit the limit of the
physical memory. When they do, they slow down very dramaticaly
(responstimes 800% higher than usual) we have done some debugging and it 
seams that die python garbage collection kicks in and kills the whole
performance. The only solution we have com up with is to restart the zope
Server before the hit the physical memory limit. 

Bye
Estartu
  
-- 

Gerhard Schmidt| Nick : estartu  IRC : Estartu  |
Fischb

Re: [Zope] Scaling problems, or something else?

2007-05-17 Thread Dieter Maurer
Paul Winkler wrote at 2007-5-16 12:04 -0400:
> ...
>- multiple ZEO clients does allow you better responsiveness with high
>concurrency, but you can also try increasing the number of threads to
>get the same effect without the added network overhead.

If you have several CPUs (and a CPU bound activity)
then an increased number of threads is inferior
to multiple processes as the GIL prevents more than a single thread
running Python code.



-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Jens Vagelpohl

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 16 May 2007, at 16:52, Paul Winkler wrote:


On Wed, May 16, 2007 at 04:24:56PM -0500, Jens Vagelpohl wrote:

There's a difference between scaling and making something faster. ZEO
makes a single instance slower, right. But you can deal with more
requests concurrently using ZEO. That's what I consider "scaling".


Agreed.  But at the same time, I don't think it makes sense to keep
deploying more and more badly tuned instances. That's what I consider
"blind shotgun scaling" :) You need to scale, but you also need to
tune - and you need to be pragmatic about which is the appropriate
approach at any given point in time.


My perspective is always high-traffic systems. I never even consider  
small sites with few hits, for which ZEO won't do much. I have a  
blind spot for those ;)


jens


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFGS46TRAx5nvEhZLIRAtImAJkBSPFTiwPO4k3LeydaI0udXtv2xwCfcyHQ
0qnQuviEBZsbMD9Uin+/d+8=
=kNMS
-END PGP SIGNATURE-
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Paul Winkler
On Wed, May 16, 2007 at 05:13:27PM +0200, Gaute Amundsen wrote:
> On Wednesday 16 May 2007 07:52, Gerhard Schmidt wrote:
> > All our frontendservers have 8gig ram. Zope gets major performance Problems
> > when it reaches the limit of physical memory. Check your system if the.
> > So having 2 zope Processes on the same system increases the Problem.
> >
> We do actually, have two. From a time we just "hadd to do something"!
> Consolidation time...

Well, maybe.  Is it an SMP box?  You can happily run 1 Zope process
per CPU. I'm told that recent versions of Linux should be pretty good
at keeping CPU affinity without any special configuration.

> At some point Varnish url rewriting wil be good enough, and then we can cut 
> apache out of it too..

I highly doubt Apache will become your bottleneck anytime soon :) But
of course it's one more thing to admin.

-- 

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Paul Winkler
On Wed, May 16, 2007 at 04:24:56PM -0500, Jens Vagelpohl wrote:
> There's a difference between scaling and making something faster. ZEO  
> makes a single instance slower, right. But you can deal with more  
> requests concurrently using ZEO. That's what I consider "scaling".

Agreed.  But at the same time, I don't think it makes sense to keep
deploying more and more badly tuned instances. That's what I consider
"blind shotgun scaling" :) You need to scale, but you also need to
tune - and you need to be pragmatic about which is the appropriate
approach at any given point in time.

-- 

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Jens Vagelpohl

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 16 May 2007, at 11:04, Paul Winkler wrote:


On Wed, May 16, 2007 at 10:47:11AM -0500, Jens Vagelpohl wrote:

On 16 May 2007, at 10:13, Gaute Amundsen wrote:

Hm.. yes, that's the way we have been inclined.. just skip the
whole ZEO
thing, and go straight for Varnish integration. Lotts of old sinns
that have
to be paid off for that to work for everything however..


I don't think he advocates removing ZEO. That would be silly.


Why? I don't think ZEO is a panacea that magically makes apps faster
in all cases.  If you're not CPU-bound, the added network overhead and
increased chance of write conflicts might be a net loss.  Based on
everything he's said so far, it sounds like Gaute still has plenty of
headroom on his CPU.


There's a difference between scaling and making something faster. ZEO  
makes a single instance slower, right. But you can deal with more  
requests concurrently using ZEO. That's what I consider "scaling".


jens



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFGS3aoRAx5nvEhZLIRAn6rAJ9eL6UpxP0KsFgBsnntiltxejKLLQCfTZ5D
xYXbrs8cb0rzyndHeKGiSM8=
=MFcw
-END PGP SIGNATURE-
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Paul Winkler
On Wed, May 16, 2007 at 10:47:11AM -0500, Jens Vagelpohl wrote:
> On 16 May 2007, at 10:13, Gaute Amundsen wrote:
> >Hm.. yes, that's the way we have been inclined.. just skip the  
> >whole ZEO
> >thing, and go straight for Varnish integration. Lotts of old sinns  
> >that have
> >to be paid off for that to work for everything however..
>
> I don't think he advocates removing ZEO. That would be silly.

Why? I don't think ZEO is a panacea that magically makes apps faster
in all cases.  If you're not CPU-bound, the added network overhead and
increased chance of write conflicts might be a net loss.  Based on
everything he's said so far, it sounds like Gaute still has plenty of
headroom on his CPU.

As for the other benefits of ZEO:

- multiple ZEO clients does allow you better responsiveness with high
concurrency, but you can also try increasing the number of threads to
get the same effect without the added network overhead.

- not using ZEO means that Zope is a single point of failure. Can't
argue with that :)

--

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Jens Vagelpohl

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 16 May 2007, at 10:13, Gaute Amundsen wrote:
Zope 2.7 doesn't scale very well with the ZEO. The more Frontend  
server
you get the more Read Conflicts you have. Migration to Zope2.8  
reduced this

problem.

I have build a squid proxy in reverse mode take the Request and  
spead them
per round robin to the ZopeFrontends. This takes quite a lot load  
from the
systems as the squid caches most of the static content (images,  
PDF files

etc).

Hm.. yes, that's the way we have been inclined.. just skip the  
whole ZEO
thing, and go straight for Varnish integration. Lotts of old sinns  
that have

to be paid off for that to work for everything however..


I don't think he advocates removing ZEO. That would be silly.

jens


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iD4DBQFGSyd/RAx5nvEhZLIRAsDDAJiz279yOqezgUZ31H4WF22dSWthAKCSGllN
FzA0ehOb1MlMWoIVOQxQ0g==
=POCf
-END PGP SIGNATURE-
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Gaute Amundsen
On Wednesday 16 May 2007 07:52, Gerhard Schmidt wrote:
> On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:
> > So there is no other possible limit in a zope instance than IO or CPU?
> > If cpu was the limiting factor I would see the 2 python processes running
> > 90% and dozens of httpd's taking up the rest?
>
> I'm running a zope site with up 70k requests per hour (40k on avarage).
> I've noticed that the main reason for performance problem is not enough
> memory.

By a rough estimate we run about 36K /hour into apache, and perhaps half of 
that from zope.
The munin process_memory plugin I installed yesterday indicates the we only 
consume about 600M of that, so I guess it looks more and more like this might 
be hour bottleneck. 

> We are running 13 Frontend Zope servers and a ZEO Backend.
>
Woha! Either I am wrong about our load, or you have a completely differnet 
type of site..

> All our frontendservers have 8gig ram. Zope gets major performance Problems
> when it reaches the limit of physical memory. Check your system if the.
> So having 2 zope Processes on the same system increases the Problem.
>
We do actually, have two. From a time we just "hadd to do something"!
Consolidation time...


> Zope 2.7 doesn't scale very well with the ZEO. The more Frontend server
> you get the more Read Conflicts you have. Migration to Zope2.8 reduced this
> problem.
>
> I have build a squid proxy in reverse mode take the Request and spead them
> per round robin to the ZopeFrontends. This takes quite a lot load from the
> systems as the squid caches most of the static content (images, PDF files
> etc).
>
Hm.. yes, that's the way we have been inclined.. just skip the whole ZEO 
thing, and go straight for Varnish integration. Lotts of old sinns that have 
to be paid off for that to work for everything however..

At some point Varnish url rewriting wil be good enough, and then we can cut 
apache out of it too..

thanks :)

Gaute
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Gaute Amundsen
On Wednesday 16 May 2007 00:13, Paul Winkler wrote:
> On Tue, May 15, 2007 at 12:00:52PM +0200, Gaute Amundsen wrote:
> > I set cache-size to 1 last night, up from the default.
> > I felt I had to try that before I had a grap in place, so I don't have
> > good numbers to estimate tha change, but with about 20 hits a sec on
> > apache, there was close to 50 loads the last hour, and just 3000
> > writes.
> > Does that look reasonable?
>
> Not really. Too many loads; you're getting a lot of cache misses and
> blowing out the cache a lot. Keep on doubling the cache size until
> your whole working set is in RAM all the time.
>
Will do! 

> > The built in help seems to indicate I should increase the cache until
> > reads approaches zero..?
>
> Yeah, for some definition of "approaches".  I think a more realistic
> minimum would be number of threads * number of ZODB writes (since each
> write potentially invalidates one cached object per thread).
>
That would give me 2500*6 = 15000. Looks like a reasonable goal.

> It's hard to quantify the point at which you're caching enough, but
> you're not even close.
>
I can see from the config that we tried cache-size 2 once, but probably 
set it back since something else was probably the problem that time :-(

> > > Also, I don't think you've mentioned what sort of app this is.
> > > Is it mostly reads or are there lots of writes?
> > > "Mostly reads" is a lot easier to optimize :)
> >
> > Big CMS system with about 70 virtual domains.
>
I keep mixing up the numbers here I notice, but make that ~180 hosts in apache 
and ~85 in zope.

> Based on your activity graph, you do indeed have a lot more reads than
> writes.
>
> What's the CMS based on? Plone? Silva?

Nothing of the sort. I'ts a "innhouse" thing...
Could be lots of gremlins burried there I know :-/

> Chris W. had a good point about the catalog.
>
> Do you have blobby data in the ZODB? (large images or files)?  Those
> tend to play havoc with zodb cache activity, since one OFS.Image is
> stored as an arbitrarily long chain of small persistent objects.  So
> whereas a Plone Document or a Page Template needs only one entry in
> the cache, an Image might need hundreds.
>
Not _that_ much. There is some from olden times, but we moved all our image 
stuff over into something based on phpgallery 2 years ago.

thanks :)

Gaute
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-16 Thread Gaute Amundsen
On Tuesday 15 May 2007 15:20, Chris Withers wrote:
> Gaute Amundsen wrote:
> > On Thursday 10 May 2007 17:57, Paul Winkler wrote:
> >> On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:
> >>> On Wednesday 09 May 2007 16:41, Paul Winkler wrote:
>  On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:
> >>>
> >>> So there is no other possible limit in a zope instance than IO or CPU?
> >>
> >> Well, there's RAM of course.  If you run out and start swapping a lot,
> >> that's a big problem :)
> >
> > About 12G in that box, so  that's unlikely.
>
> How big's your Data.fs? If it's bigger than 12GB, then careless code can
> easilly chew through all your ram...
>
As it happens its' very close to 12 G!


> >> Very helpful if the trouble is ZODB I/O.  Have a look at the activity
> >> graph and see if you're getting lots of loads... might mean your ZODB
> >> cache is too small and is getting thrashed.
> >
> > I set cache-size to 1 last night, up from the default.
>
> If you're using a ZCatalog, you might consider splitting it into it's
> own zodb in order to improve the performance of this cache...
>
Using it in small bits all over the place. That would make it rather hard to 
mount separately, would it not?

> > Big CMS system with about 70 virtual domains.
>
Make that ~180 hosts in apache and ~85 in zope.

> How often does the content change?
>
Not often at all. 
I've got a munin plugin for it now, and that indicates the average is less 
than 2500 stores over 5 min. Rather hard to see actually, against the reads 
which fluctuates between 30 K and 100 K.

(will plublish that plugin, once I add some comments and such.)

> cheers,
>
> Chris

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-15 Thread Gerhard Schmidt
On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:
> So there is no other possible limit in a zope instance than IO or CPU? 
> If cpu was the limiting factor I would see the 2 python processes running 90% 
> and dozens of httpd's taking up the rest?

I'm running a zope site with up 70k requests per hour (40k on avarage). I've 
noticed that the main reason vor performance problem is not enough memory. 

We are running 13 Frontend Zope servers and a ZEO Backend. 

All our frontendservers have 8gig ram. Zope gets major performance Problems
when it reaches the limit of physical memory. Check your system if the. 
So having 2 zope Processes on the same system increases the Problem. 

Zope 2.7 doesn't scale very well with the ZEO. The more Frontend server 
you get the more Read Conflicts you have. Migration to Zope2.8 reduced this
problem. 

I have build a squid proxy in reverse mode take the Request and spead them
per round robin to the ZopeFrontends. This takes quite a lot load from the 
systems as the squid caches most of the static content (images, PDF files 
etc). 

Regard 
Estartu

-- 

Gerhard Schmidt| Nick : estartu  IRC : Estartu  |
Fischbachweg 3 ||  PGP Public Key
86856 Hiltenfingen | EMail: [EMAIL PROTECTED]  |  on request 
Germany||  



pgpyU9o8lG1cL.pgp
Description: PGP signature
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-15 Thread Paul Winkler
On Tue, May 15, 2007 at 12:00:52PM +0200, Gaute Amundsen wrote:
> I set cache-size to 1 last night, up from the default.
> I felt I had to try that before I had a grap in place, so I don't have good 
> numbers to estimate tha change, but with about 20 hits a sec on apache, there 
> was close to 50 loads the last hour, and just 3000 writes. 
> Does that look reasonable?

Not really. Too many loads; you're getting a lot of cache misses and
blowing out the cache a lot. Keep on doubling the cache size until
your whole working set is in RAM all the time.

> The built in help seems to indicate I should increase the cache until reads 
> approaches zero..?

Yeah, for some definition of "approaches".  I think a more realistic
minimum would be number of threads * number of ZODB writes (since each
write potentially invalidates one cached object per thread).

It's hard to quantify the point at which you're caching enough, but
you're not even close.

> > Also, I don't think you've mentioned what sort of app this is.
> > Is it mostly reads or are there lots of writes?
> > "Mostly reads" is a lot easier to optimize :)
> >
> Big CMS system with about 70 virtual domains.

Based on your activity graph, you do indeed have a lot more reads than
writes.

What's the CMS based on? Plone? Silva?
Chris W. had a good point about the catalog.

Do you have blobby data in the ZODB? (large images or files)?  Those
tend to play havoc with zodb cache activity, since one OFS.Image is
stored as an arbitrarily long chain of small persistent objects.  So
whereas a Plone Document or a Page Template needs only one entry in
the cache, an Image might need hundreds.

--

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-15 Thread Chris Withers

Gaute Amundsen wrote:

On Thursday 10 May 2007 17:57, Paul Winkler wrote:

On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:

On Wednesday 09 May 2007 16:41, Paul Winkler wrote:

On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:

So there is no other possible limit in a zope instance than IO or CPU?

Well, there's RAM of course.  If you run out and start swapping a lot,
that's a big problem :)


About 12G in that box, so  that's unlikely.


How big's your Data.fs? If it's bigger than 12GB, then careless code can 
easilly chew through all your ram...



Very helpful if the trouble is ZODB I/O.  Have a look at the activity
graph and see if you're getting lots of loads... might mean your ZODB
cache is too small and is getting thrashed.


I set cache-size to 1 last night, up from the default.


If you're using a ZCatalog, you might consider splitting it into it's 
own zodb in order to improve the performance of this cache...



Big CMS system with about 70 virtual domains.


How often does the content change?

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-15 Thread Gaute Amundsen
On Thursday 10 May 2007 17:57, Paul Winkler wrote:
> On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:
> > On Wednesday 09 May 2007 16:41, Paul Winkler wrote:
> > > On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:
> >
> > So there is no other possible limit in a zope instance than IO or CPU?
>
> Well, there's RAM of course.  If you run out and start swapping a lot,
> that's a big problem :)
>
About 12G in that box, so  that's unlikely.

> btw, if you haven't come across this yet, I recommend reading
> Chris McDonough's "Scaling Zope" presentation. It's a couple years old
> but still a very good place to start:
> http://www.plope.org/misc/szweb/img0.html
>
Looks good. Thanks :-)

> > Something out of Control_Panel/Database/main/manage_activity
> >  perhaps?
>
> Very helpful if the trouble is ZODB I/O.  Have a look at the activity
> graph and see if you're getting lots of loads... might mean your ZODB
> cache is too small and is getting thrashed.

I set cache-size to 1 last night, up from the default.
I felt I had to try that before I had a grap in place, so I don't have good 
numbers to estimate tha change, but with about 20 hits a sec on apache, there 
was close to 50 loads the last hour, and just 3000 writes. 
Does that look reasonable?

The built in help seems to indicate I should increase the cache until reads 
approaches zero..? 

> Also, I don't think you've mentioned what sort of app this is.
> Is it mostly reads or are there lots of writes?
> "Mostly reads" is a lot easier to optimize :)
>
Big CMS system with about 70 virtual domains.

> > Is there a way to get that data out without going through port 8080?
>
> Don't think so. But I've never looked.
>
Hm. I guess if I at least use a url on the management port, it ought to stay 
available when everything else bogs down.

> > Maybe I wil start logging the responstime directly like that!
> > Hm.. good idea :)
>
> If you need that, the trace log can also tell you exactly what the
> response time per request is.
>
Good for debugging, but for permanent use, I wil use this:
http://muninexchange.projects.linpro.no/?search&cid=10&pid=61&phid=81

> > > - DeadlockDebugger may be informative.
> > >   http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger
> >
> > Sounds a little drastic on a production server, but it may stil come
> > to that..  Ought to test it out on another server I guess.
>
> Definitely test-drive it on a scratch server, but production is where
> you really need it :-)
>
> re. trace log:
> > That looks interesting, except that it can take 15 minutes or more
> > to restart zope when load is at the worst. I could try it outside of
> > peak ours I guess.
>
> Yeah, restart during off hours and leave it enabled until you have one
> episode of peak load slowdown.

Gaute.
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-10 Thread Paul Winkler
On Thu, May 10, 2007 at 02:10:40AM +0200, Gaute Amundsen wrote:
> On Wednesday 09 May 2007 16:41, Paul Winkler wrote:
> > On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:
> So there is no other possible limit in a zope instance than IO or CPU?

Well, there's RAM of course.  If you run out and start swapping a lot,
that's a big problem :)

btw, if you haven't come across this yet, I recommend reading
Chris McDonough's "Scaling Zope" presentation. It's a couple years old
but still a very good place to start:
http://www.plope.org/misc/szweb/img0.html

> Something out of Control_Panel/Database/main/manage_activity
>  perhaps?

Very helpful if the trouble is ZODB I/O.  Have a look at the activity
graph and see if you're getting lots of loads... might mean your ZODB
cache is too small and is getting thrashed.

Also, I don't think you've mentioned what sort of app this is.
Is it mostly reads or are there lots of writes?
"Mostly reads" is a lot easier to optimize :)

> Is there a way to get that data out without going through port 8080?

Don't think so. But I've never looked.

> Maybe I wil start logging the responstime directly like that! 
> Hm.. good idea :)

If you need that, the trace log can also tell you exactly what the
response time per request is.

> > - DeadlockDebugger may be informative.
> >   http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger
> >
> Sounds a little drastic on a production server, but it may stil come
> to that..  Ought to test it out on another server I guess.

Definitely test-drive it on a scratch server, but production is where
you really need it :-)

re. trace log:
> That looks interesting, except that it can take 15 minutes or more
> to restart zope when load is at the worst. I could try it outside of
> peak ours I guess.

Yeah, restart during off hours and leave it enabled until you have one
episode of peak load slowdown.


-- 

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-09 Thread Gaute Amundsen
On Wednesday 09 May 2007 16:41, Paul Winkler wrote:
> On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:
> > Just a quick call for ideas on this problem we have...
> >
> > Setup:
> > Zope 2.7.5 (~180 sites) -> apache (-> varnish for 1 high profile site)
> >
> > Most noticeable symtoms:
> > Takes 30 sec. or more to serve a page, or times out.
> > Sporadic problem, but allways during general high load.
> > Lasts less than 1 hour.
> > Restarting zope does not help.
> > Lots of apache processes in '..reading..' state
> > Apache accesses and volume is down.
> > Server load is unchanged, and < 2.0
> > Apache processes is way up (~250 aganinst <40)
> > Netstat "established" connections is WAY up (~650 aganist < 50)
>
> The increase in netstat connections and apache processes indicates
> lots of simultaneous traffic, but it's interesting that Apache
> accesses is down.  Since hits are logged only on completion, it may be
> that many of the requests are hung.
>
That was my reasoning too.

> > Is this zope hitting some sort of limit and just letting Apache hang?
> > Would setting up ZEO on the same box make a difference,
>
> ZEO doesn't buy you any performance unless you have multiple Zope
> clients reading from it, and a load balancer in front.  This will help
> IF your application is CPU-bound, which yours is not (I assume by
> server load you mean CPU).

So there is no other possible limit in a zope instance than IO or CPU? 
If cpu was the limiting factor I would see the 2 python processes running 90% 
and dozens of httpd's taking up the rest?

Can You think of any good parameters I can get at with a small script that 
would be good to graph with all the rest to shed som light on this? 
(we are using munin)

Something out of Control_Panel/Database/main/manage_activity perhaps?
Is there a way to get that data out without going through port 8080?

How about something out of /proc/`cat /home/zope/sites/site1/var/Z2.pid`/XXX?
Need to read up on procfs I guess.

> ZEO can actually *hurt* if you're IO-bound, because it adds network
> overhead to ZODB reads and writes. It's very bad if you have large
> Image or File objects (which you probably shouldn't have in the ZODB
> anyway).
>
Good to hear. I ws not particularly relishing the thought of the nescesary 
load balancing on that single box either :-/

> > or would it be better
> > to extend varnish coverage?
>
> Probably a good idea anyway... but you want to find out what the
> problem really is.
>
> > What would you do to pinpoint the problem?
>
> I'd first try hitting Zope directly during the problem to see if the
> slowdown is there.  If so, I'd then try either:
>
Should be possible with lynx on localhost.
Have done that before for other purpose,  Should have thought of that.

Maybe I wil start logging the responstime directly like that! 
Hm.. good idea :)

> - DeadlockDebugger may be informative.
>   http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger
>
Sounds a little drastic on a production server, but it may stil come to that..
Ought to test it out on another server I guess.

> - Enable Zope's trace log and use requestprofiler.py to see if there
>   is a pattern to the requests that trigger the problem.  Eg. maybe
>   all your zope worker threads are waiting on some slow IO task. See
>   the logger section of zope.conf.

That looks interesting, except that it can take 15 minutes or more to restart 
zope when load is at the worst. I could try it outside of peak ours I guess.

Thanks for the innput.
Really helped get me unstuck, as you can see :)

Regards

Gaute 
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Scaling problems, or something else?

2007-05-09 Thread Paul Winkler
On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote:
> Just a quick call for ideas on this problem we have...
> 
> Setup: 
> Zope 2.7.5 (~180 sites) -> apache (-> varnish for 1 high profile site)
> 
> Most noticeable symtoms: 
> Takes 30 sec. or more to serve a page, or times out.
> Sporadic problem, but allways during general high load.
> Lasts less than 1 hour. 
> Restarting zope does not help.
> Lots of apache processes in '..reading..' state
> Apache accesses and volume is down.
> Server load is unchanged, and < 2.0
> Apache processes is way up (~250 aganinst <40)
> Netstat "established" connections is WAY up (~650 aganist < 50)

The increase in netstat connections and apache processes indicates
lots of simultaneous traffic, but it's interesting that Apache
accesses is down.  Since hits are logged only on completion, it may be
that many of the requests are hung.

> Is this zope hitting some sort of limit and just letting Apache hang? 
> Would setting up ZEO on the same box make a difference,

ZEO doesn't buy you any performance unless you have multiple Zope
clients reading from it, and a load balancer in front.  This will help
IF your application is CPU-bound, which yours is not (I assume by
server load you mean CPU).

ZEO can actually *hurt* if you're IO-bound, because it adds network
overhead to ZODB reads and writes. It's very bad if you have large
Image or File objects (which you probably shouldn't have in the ZODB
anyway).

> or would it be better 
> to extend varnish coverage?

Probably a good idea anyway... but you want to find out what the
problem really is.

> What would you do to pinpoint the problem?

I'd first try hitting Zope directly during the problem to see if the
slowdown is there.  If so, I'd then try either:

- DeadlockDebugger may be informative.
  http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger

- Enable Zope's trace log and use requestprofiler.py to see if there
  is a pattern to the requests that trigger the problem.  Eg. maybe
  all your zope worker threads are waiting on some slow IO task. See
  the logger section of zope.conf.


-- 

Paul Winkler
http://www.slinkp.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


[Zope] Scaling problems, or something else?

2007-05-09 Thread Gaute Amundsen
Just a quick call for ideas on this problem we have...

Setup: 
Zope 2.7.5 (~180 sites) -> apache (-> varnish for 1 high profile site)

Most noticeable symtoms: 
Takes 30 sec. or more to serve a page, or times out.
Sporadic problem, but allways during general high load.
Lasts less than 1 hour. 
Restarting zope does not help.
Lots of apache processes in '..reading..' state
Apache accesses and volume is down.
Server load is unchanged, and < 2.0
Apache processes is way up (~250 aganinst <40)
Netstat "established" connections is WAY up (~650 aganist < 50)

Is this zope hitting some sort of limit and just letting Apache hang? 
Would setting up ZEO on the same box make a difference, or would it be better 
to extend varnish coverage?

What would you do to pinpoint the problem?

Any ideas are welcome!

Regards

Gaute Amundsen
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )