On 14 Nov 2003, at 14:55, Andy Redhead wrote:
the
On top of that, if you are indexing the url string field (which I think
would be reasonable if you had a lot of entries in the repository) then"read",db will be rebuilding that index for each redundant url several times over... though this probably isn't as frequent an occurrence as ait will affect "insert new" and "move" operations.
This is a good point. We are going to use slide as a CMS that will serve
and
compose web pages in real time, so read-performance is much more important
for us than write/copy/move-performance. If you are using slide as a
versioning system (to replace cvs or something like that) this might be
different.
So this is a point that should be discussed.
Ah-ha, that does make sense. However... I am hoping to use Slide in a more
"editorial" environment, where documents may composed from many "fragments"
then published to the runtime environment (which is where I worry about
runtime performance). These fragments may well be imported in quite large
numbers (say between 10k and 50k at a time) and moved around a bit as people
decide how they want to use them. This "bulk loading" and the editorial
manipulation are where my concerns come from.
I tend to think that a CMS should *NOT* serve content straight from the content repository, exactly for the reasons above: it's hard to optimize it for that!
The best (IMHO) architectural solutions for this is to have something like
frontend <--- repository <--- backend
where "frontend" does the presentation stage and can do inverted caching.
Inverted caching means that you cache aggressively on the frontend stage (means that you don't call synchronously the repository if the resource is cached) then attach 'listeners" to the repository that will trigger invalidating messages to the frontend.
If you do aggregation or heavy manipulation on the frontend, the inverted caching system must be able to keep a tree of dependencies for each produced resource (so that the invalidation of a repository resource can potentially invalidate all the resources in cache that depend on that one) and things get complex, but if you don't do aggregation, the cache is quite easy to do [I'm trying to do this with simple mod_proxy and mod_cache... since mod_cache allows request parameters to invalidate the cache entry... potential DoS attack asides, it's a nice and easy setup and you get it for free if you have a mod_jk connector or you do servlets thru ProxyPass]
What should be *fast* and scalable is DASL.
Everything else can be slow and would not be such a big issue if you use the above architecture [unless you have thousands of users editing concurrently on the repo, but that's unlikely even in the most complex CMS scenarios]
-- Stefano.
smime.p7s
Description: S/MIME cryptographic signature
