On Wednesday 09 May 2007 16:41, Paul Winkler wrote: > On Wed, May 09, 2007 at 12:07:54PM +0200, Gaute Amundsen wrote: > > Just a quick call for ideas on this problem we have... > > > > Setup: > > Zope 2.7.5 (~180 sites) -> apache (-> varnish for 1 high profile site) > > > > Most noticeable symtoms: > > Takes 30 sec. or more to serve a page, or times out. > > Sporadic problem, but allways during general high load. > > Lasts less than 1 hour. > > Restarting zope does not help. > > Lots of apache processes in '..reading..' state > > Apache accesses and volume is down. > > Server load is unchanged, and < 2.0 > > Apache processes is way up (~250 aganinst <40) > > Netstat "established" connections is WAY up (~650 aganist < 50) > > The increase in netstat connections and apache processes indicates > lots of simultaneous traffic, but it's interesting that Apache > accesses is down. Since hits are logged only on completion, it may be > that many of the requests are hung. > That was my reasoning too.
> > Is this zope hitting some sort of limit and just letting Apache hang? > > Would setting up ZEO on the same box make a difference, > > ZEO doesn't buy you any performance unless you have multiple Zope > clients reading from it, and a load balancer in front. This will help > IF your application is CPU-bound, which yours is not (I assume by > server load you mean CPU). So there is no other possible limit in a zope instance than IO or CPU? If cpu was the limiting factor I would see the 2 python processes running 90% and dozens of httpd's taking up the rest? Can You think of any good parameters I can get at with a small script that would be good to graph with all the rest to shed som light on this? (we are using munin) Something out of Control_Panel/Database/main/manage_activity perhaps? Is there a way to get that data out without going through port 8080? How about something out of /proc/`cat /home/zope/sites/site1/var/Z2.pid`/XXX? Need to read up on procfs I guess. > ZEO can actually *hurt* if you're IO-bound, because it adds network > overhead to ZODB reads and writes. It's very bad if you have large > Image or File objects (which you probably shouldn't have in the ZODB > anyway). > Good to hear. I ws not particularly relishing the thought of the nescesary load balancing on that single box either :-/ > > or would it be better > > to extend varnish coverage? > > Probably a good idea anyway... but you want to find out what the > problem really is. > > > What would you do to pinpoint the problem? > > I'd first try hitting Zope directly during the problem to see if the > slowdown is there. If so, I'd then try either: > Should be possible with lynx on localhost. Have done that before for other purpose, Should have thought of that. Maybe I wil start logging the responstime directly like that! Hm.. good idea :) > - DeadlockDebugger may be informative. > http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger > Sounds a little drastic on a production server, but it may stil come to that.. Ought to test it out on another server I guess. > - Enable Zope's trace log and use requestprofiler.py to see if there > is a pattern to the requests that trigger the problem. Eg. maybe > all your zope worker threads are waiting on some slow IO task. See > the logger section of zope.conf. That looks interesting, except that it can take 15 minutes or more to restart zope when load is at the worst. I could try it outside of peak ours I guess. Thanks for the innput. Really helped get me unstuck, as you can see :) Regards Gaute _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )