This post is generously snipped to avoid more agitation.

As I asked before, is there some, more recent document, which
highlights how to structure Plone for a large scale deployment? If
there is, I'd sure like to read it ...

If you have thousands of users or tens of gigabytes of data, then it normally pays to get some expert help in to assure your architecture. That goes for any platform, Plone included. The basics are fairly well known: You have a ZEO server, you have multiple ZEO clients (at least one per processor core), you have load balancer in front of those clients (e.g. pound), and you have Varnish or Squid for caching, with CacheFu correctly configured in Plone.

Our main site keels over frequently when it was on our Zope/Zeo
cluster. On its own box, it only restarts itself 1-5 times per day.

This is still unacceptable - I'd never run a site that behaved like that. I don't think many people would. Most Plone sites don't do this, so there must be something in your setup that could be improved or fixed. I can't be more specific than that, though.

What do you mean by "instance"? Install Zope (n front ends plus ZEO
backend). Add Plone site. Repeat x 20.

That's the setup I'm talking about. As far as I can tell, it's the
most natural way to add different websites (e.g. Plone sites with
different URLs). It also turns out not to be very performant. (Note:
there's only one ZEO box, with one data.fs, which for us is <2GB)

I don't think that "a priori" having 20+ small Plone sites in one Zope instance is a drag on performance, except that you're duplicating some things (like the catalog) that may add a bit of overhead compared to having one site that's 20 times as big. I suspect the performance and stability issues you're seeing have more to do with what's going on inside one or more of those sites.

This is where you need to learn how to debug things. Your logs will tell you when something crashes. From your previous traceback, it looks like something in RedirectionTool (which, by the way, is not a core part of Plone). Have you tried to uninstall this tool temporarily (take a backup!) to see whether the problem goes away? Have you tried to ask on the mailing lists what the problem is? Have you tried to get someone with Python skills to do some debugging on the line that's causing the error? Have you tried to understand what triggers the error - is it happening on any 404 page, for example? Or on all pages? Or on pages when a redirect alias is being invoked?

Install IIS/Apache. Add sites x 20, with host header redirection. No
problems. Add dynamic elements (.NET, modPHP). Still no problems.

That's not a valid comparison. Add 20 advanced content management systems written in .NET or modPHP, fill them with the same content, and then come talk to me.

This kind of talk is fairly pointless, though. You're having a whinge. If you want to have a whinge, go ahead, but then I'll stop wasting my time trying to figure out what your problem is and give you advice. If you want to get advice, then you're much more likely to get it if you adjust your tone to seem less combative.

(Again, thanks to Raphael Ritz for the ZEO Raid suggestion, which I'm
trying out on a separate box, though the Subversion tags make my
SysAdmins hesitant of its use on our production servers.)

From what I understand, ZEO Raid is not yet completely finished. I'd speak to Christian Theune about it. I know he's very close to having it finished, but is looking for sponsorship to get it over the final hurdle.

RelStorage will let you store things in Oracle or Postgres and thus use their scalability features. It may be a more mature option. I know Jarn are using it currently.

However, as you've been told repeatedly, it's very unlikely, based on what you've told us, that your problems lie at the ZEO server, and thus that ZEORaid or RelStorage would help. It's extremely likely that the tracebacks you are seeing every second *are* the symptom of the problem, and those are *not* caused by ZEO server issues. If you had ZEO server issues, you'd been seeing different messages (related to the ability of the ZEO client to talk to the ZEO server).

And, there doesn't seem to be the experience with large deployments.
Even if we were to add consulting services to our sites, we're not
certain we'd get a viable deployment that could handle hundreds of
campus department web sites and hundreds of thousands of pages.

Lots of people run sites that are much bigger than what you've described.

If you, or Martin, or someone else, based on a stack trace we were
getting every 200 milliseconds could say -- "Oh, that looks like
this, you could probably do that" -- I'd have a case to consider if
hiring to address that problem would be a worthwhile investment.

I did that when you first posted it. The problem is in RedirectionTool. It tells you which line. Any capable developer will be able to at least do some debugging starting there. See also my suggestions above.

Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


_______________________________________________
Setup mailing list
[email protected]
http://lists.plone.org/mailman/listinfo/setup

Reply via email to