On Aug 24, 2005, at 6:27 AM, Martijn Faassen wrote:
A few contributions to this interesting discussion...
[snip the Zope 3 catalog is not a hack, and clean and simple]
The catalog and index code is not a hack, and is in fact simple,
effective and flexible. Python is the query language, and the
lack of an optimizer is not a reason to go running to an RDBMS
index. The catalog and index code could use polish and even
alternate implementations, but the BTrees, the core code, are
I have had some opportunity to work with the Zope 3 catalog
recently, and I have a few comments. First of all, I agree with the
main idea that the Zope 3 catalog is not a hack, and is clean and
flexible. I believe the catalog should be invested in, as I think
Now as to where I see areas where features are lacking in the Zope
Underfeatured query API
I do think that currently the API to query it is woefully
I've tried to work on this problem and am sitting on some code that
just needs a bit of time to polish and release that allows a simple
query language on top of the catalog. It's just building up a tree
of python objects for queries, nothing special, but it is a lot
higher level than what's already there.
As I think I've said before, sounds like a good start. :-)
Another way of looking at this--or simply an additional feature on
top of a query langauge--might be to make the IFBTree results easier
to manipulate in an easier way. The code in the zc sandbox for the
extent (http://svn.zope.org/Sandbox/zc/catalog/extentcatalog.py) is a
sketch of what I mean--following some of the set API, for instance.
The reason for my interest in this is that we have very little code
that uses the catalog to return objects--just IFBTree data
structures. Just working with the IFBTree data structures gives you
a lot more flexibility for integration of catalog results with other
Casey Duncan had explored some very interesting ideas in his pypes
project (http://cvs.zope.org/Packages/pypes/) for a query language,
by the way, but his ambition is still largely unrealized, even though
much of his query language work could be ported to Zope 3 without a
huge amount of trouble.
Arguably, query optimization would be the feature that would make a
given syntax win.
Fast, easy batching/sorting
I don't know how to do easy, efficient batching/sorting with the
catalog. I'd like to be able to query *just* a batch of objects,
sorted, for user interface purposes. There doesn't seem to be a
straightforward way to do this yet, and this is a very common use
case. The batching implementation sitting out there in
zope.bugtracker.batching is nice, but doesn't deal with the catalog.
I think this should be fixable with a bit more infrastructure
though. Getting the right batch is just a query on an index, and
the result can be sorted afterwards, though there are tricky issues
getting the right batch *size*.
Since we usually work with IFBTree data structures until the very
end, we get most of the benefits of batching. Once you are done with
your processing, you can simply wrap the result in a
zope.app.catalog.catalog.ResultSet (or similar) and be good to go.
This is why I think making the story for working directly with the
BTree data structures easier might be a good way to go.
Sorting is hard to do efficiently, and easy to *think* you are making
an optimization. We are currently doing it "naively" (to the degree
that using the very efficient Python sort is naive), and Jim refers
to research that indicates that a good non-naive approach is not
clear. I can certainly imagine various approaches. Carefully
arranging your merges can actually result in a pre-sorted set. We're
not being that careful.
Missing powerful query concepts
Certain powerful query concepts like joins, available in a
relational setting, are missing. I've already run into a scenario
where I wanted to someting like this: given a bunch of version
objects with field 'id', where multiple objects can have the same
'id' to indicate they're versions of the same object, I want all
objects where field 'workflow_state' is 'PUBLISHED' unless there is
another object with the same id that have workflow_state 'NEW', in
which case I want that one'.
I think joins would be a way to solve it, though I haven't figured
out the details, nor how to implement them efficiently on top of
the catalog. This kind of thing is where a relational database
makes life a lot simpler.
I guess that's taste. I'd be happier with Python.
Zope catalog benefits
Now as to benefits of using the ZODB instead of a relational
database. I've seen some mentioned already, and I think there are
more that haven't been mentioned yet. I realize that some of these
issues don't exist so much with 'transparent' maps like Ape, which
acts like the ZODB, though *if* a relational database is used by an
application I also think that those features will be used
(otherwise, why do it?), which will still reduce the portability to
Right: my concern.
Common development platform
I've seen it mentioned elsewhere in this thread that the ZODB can
unify the development community, whereas O/R mapping technologies
(in particular those not transparent to the ZODB) run the risk of
scattering it. I think this is an interesting argument so I'd just
like to underline it.
Ease of deployment
Right now a Zope application is relatively difficult to deploy
compared to some other solutions like PHP, but, it's probably
easier to deploy than other solutions which require a relational
database backend. Now it might seem that 'enterprise' deployments
are big anyway, so we don't have to worry about making this harder,
* enterprises will ask questions like "which relational database
does it support? we standardized on relational database system foo,
does it work with that?" We run the risk of having to say "no", or,
if "yes", we may run the risk of "oops, we cannot test this easily
with database system foo as we don't have it here."
* requiring a RDB for deployment makes it harder to market our
software, as it's harder to just download and install software into
your Zope to try it out. You need to set up a relational database
as well. I may be mistaken, but think Plone would be less popular
if a relational database would be *necessary* in order to play with
While I agree with your general point, Ruby on Rails might call that
assertion into question a bit.
* closely related, requiring a RDB for deployment makes it harder
to market our open source software to other developers. This ties
in closely to the argument above involving the risk of the
community fracturing. Even inside a company having more software
requirements like a RDB may hinder team development (where each
team member runs a separate instance of the software), as there's
simply that much more to set up and thus harder for someone to get
up to speed with a project.
One point I also haven't seen mentioned yet is that I don't want to
have to have a relational database installed in order to run my
tests. The great thing about the ZODB and persistence is that it's
very transparent. Persistent instances are very very similar to non-
I'll add another.
Because of the Zope 3 component system, if we can use the current
catalog interface, or invent another, to develop both a ZODB/BTree-
based implementation and an RDBMS-based implementation, it's possible
that users who wanted to choose the RDBMS strengths would be able to
do so without dividing the user community.
[snip blob support argument]
I agree that the blob argument for RDB mapping is not convincing.
There are other solutions around and this is being improved rapidly.
Anyway, all of the arguments against object/relational mapping
aside, I do think this is an interesting area to explore. You *do*
get a whole lot of power using a relational database, after all. I
myself am actually in two minds concerning very transparent ZODB-
style solutions like Ape or less transparent but more explicit uses
of O/R mappings like SQLObject. While the transparency has many
benefits mentioned before, the more straightforward mapping has the
benefits of simplicity, may map to relational databases more
easily, and may expose powerful relational features more
It's true. I hope that an entire platform doesn't force the decision
on its potential users, though.
Zope3-dev mailing list