I recently read Florent's object/relational blog entry at http:// blogs.nuxeo.com/sections/blogs/florent_guillaume/ 2005_08_11_object_relational . It's getting a bit old now, but I didn't see much discussion (or a way to make a comment) so I thought I'd bring it up here to invite shared thoughts on his provocative ideas. Florent spoke of both Zope 2 and Zope 3. Because of my interests, my current job description, and my choice of mailing list for this discussion, I'll be speaking exclusively about the Zope 3 side of things. My O/R experience is on a smaller scale than Florent's (or Ape's) goals, so my responses are offered with knowledge that I may need to be corrected.

Florent suggests that a "proper enterprise-grade application server" using Zope should use an object-relational mapper such as Ape, and rely on it at its core. He made a number of interesting observations about how this would allow us to discard the Zope catalog "hack", store blobs on the filesystem, and take advantage of RDBMS maturity for managing and analyzing content data and metadata.

While I agree with some of his observations, I believe that Florent's position--a blanket embrace of O-R underneath ZODB for all "enterprise" use cases--is overzealous. Large business content management applications can have many different usage patterns and many different design characteristics and tradeoffs. An O-R mapping is one choice that has advantages and disadvantages.

The most serious disadvantage to O/R mapping is that the cost of creating and maintaining the mapping is not trivial. Requiring an O/ R mapping is a significant barrier of entry, unless you dump all of the data in something like Ape's 'extra stuff' store--in which case you've lost many of the compelling advantages of an RDBMS back end in the first place. This cost could be somewhat alleviated with tools; however, to my knowledge, the tools do not yet exist. Even with the tools, it would still be an extra layer of work demanded just to get things to work.

Also, while I won't confidently assert speed losses as a disadvantage, it's worth mentioning that mapping code may (will usually?) introduce more CPU churn (and slower app speeds) than FileStorage.

In any case, I know there are some cases in which O/R mappings would be very useful. I do not agree that it is generically the right approach. It has a cost. Moreover, the advantages Florent listed are not as clear cut as he described.

Florent identified three advantages to O/R mapping: according to his blog, RDBMS indexing is clearly superior to the Zope catalog; blobs are best handled with mapping code; and content data and metadata are clearly tabular and so fit within a relational database cleanly and obviously, providing advantages such as built in aggregating tools. He makes some good points, but I have caveats or disagreements with all three.

First, he identified the Zope catalog as a "hack" for which RDBMS indexes would be a cure. I don't see how the Zope 3 catalog is a hack, nor do I necessarily see RDBMS indexes as inherently advantageous in all cases.

I agree that it is a problem that, given enough indexed objects and/ or enough indexes and/or a small object cache, loading the buckets when you traverse indexes can flush other objects from the ZODB cache. If the flushed objects are expensive to load and frequently used, that can be a noticeable problem. I believe this is a problem that can be addressed, or at least tuned for given applications. When it bites us enough that one of us in the community implements a smarter ZODB cache (or other solution) we'll all win.

It is also true, though you did not mention it, that the Zope 3 catalog has no standardized query language or query optimizer. The first job has some contenders, but the second one has no champions to my knowledge.

These are not reasons to discard BTrees, or indexes based on them. They provide some significant advantages. Both common indexing requirements and new data structures, such as the fascinating RDFLib that Michel Pelletier has worked on, are handled well by the BTree code. The BTree code is time-tested, relatively easy to use, and well maintained. When combined with the transactional virtues of ZODB, the conflict resolution story reads very well, and very similarly to that of PostgreSQL (default behavior).

In terms of the actual indexes and catalog design, the Zope 3 text index is not as featureful as others, but the core algorithms are equivalent or even superior to many of them. In addition, the interface system and the catalog design allows integration with other backends, such as the Lucene text index (as Stephan has illustrated, I believe). It could even support an index with a RDBMS table back end, if desired. This might get you some of the advantages you listed for the O/R back end at a lower cost of entry.

The catalog and index code is not a hack, and is in fact simple, effective and flexible. Python is the query language, and the lack of an optimizer is not a reason to go running to an RDBMS index. The catalog and index code could use polish and even alternate implementations, but the BTrees, the core code, are fantastic tools.

That said, certainly if your data and requirements suggested an RDBMS back end for other reasons, the advantages of robust and common RDBMS indexing are compelling. My argument is simply that it is not a clear-cut win for an O/R mapping.

Another case Florent made for O/R mapping is blob support. I can see this answering a number of the common use cases for blob support. However, solutions like Chris McDonough's Zope 2 blob product, to which you linked, seem like they could provide many of the same advantages, without requiring a full O/R decision for your app. I don't have enough information to weigh your opinion that the O/R solution would be simpler than Chris' kind of solution. In any case, it does not necessarily seem like a clean win for the O/R argument.

The last advantage Florent mentioned for O/R mappings was that the tabular structure of RDBMS fit his data--presumably data that he felt was representative of the data needed by an "enterprise" CMS-- better. Having moved from RDBMS systems to the ZODB, this surprised me. In my experience, large businesses are very likely to have interconnected CMS data, one object pointing to another, in a way that is very well suited to object databases rather than relational databases. Even in Florent's blog, the two examples of document hierarchy and (branched) versioning arguably match "classical" object database advantages better.

Of course, yes, RDBMS systems have many more years of maturity, and several have many more thousands of dollars spent on them than the ZODB. It's reasonable to find them compelling, whether for their new XML features or for their old reliability, stability, and mathematical efficiencies. But RDBMS designers continue to move to less table-oriented designs, trying to get many features we already have, whether they work through object integration or XML integration. Some Zope applications will significantly benefit from an O/R mapping, but if you are a Python programmer, the ZODB alone, with the transparent FileStorage or DirectoryStorage back ends, is often a compelling, simpler, and reasonable alternative.

In conclusion, the nebulous concept of "enterprise" applications on Zope does not have a clear cut decision for or against an O/R mapper such as Ape. The cost of O/R mappings is not inconsequential, and the advantages are not conclusive. I hope that large projects that the Zope community works on together can support both, and do not depend on or exclude their use. Florent makes some excellent observations, and solutions to the problems he identifies could be done at a number of layers in the code base. Meanwhile, switching entirely to an O/R back end over FileStorage or DirectoryStorage feels like a significant case of "throwing the baby out with the bath water".

Zope3-dev mailing list
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to