Gary Poster wrote:
On Aug 24, 2005, at 6:27 AM, Martijn Faassen wrote:
Now as to where I see areas where features are lacking in the Zope 3
Underfeatured query API
[snip query API discussion]
Another way of looking at this--or simply an additional feature on top
of a query langauge--might be to make the IFBTree results easier to
manipulate in an easier way. The code in the zc sandbox for the extent
(http://svn.zope.org/Sandbox/zc/catalog/extentcatalog.py) is a sketch
of what I mean--following some of the set API, for instance. The
reason for my interest in this is that we have very little code that
uses the catalog to return objects--just IFBTree data structures. Just
working with the IFBTree data structures gives you a lot more
flexibility for integration of catalog results with other data structures.
My code works mostly with the IFBTree objects as well, though I'll have
to check out your code to see what you mean exactly.
Casey Duncan had explored some very interesting ideas in his pypes
project (http://cvs.zope.org/Packages/pypes/) for a query language, by
the way, but his ambition is still largely unrealized, even though much
of his query language work could be ported to Zope 3 without a huge
amount of trouble.
I need to take a look at this; I've seen the checkins but never quite
got what this was about. :)
Arguably, query optimization would be the feature that would make a
given syntax win.
Or at least a given AST; a syntax is not strictly necessary.
Fast, easy batching/sorting
I don't know how to do easy, efficient batching/sorting with the
catalog. I'd like to be able to query *just* a batch of objects,
sorted, for user interface purposes. There doesn't seem to be a
straightforward way to do this yet, and this is a very common use
case. The batching implementation sitting out there in
zope.bugtracker.batching is nice, but doesn't deal with the catalog.
I think this should be fixable with a bit more infrastructure though.
Getting the right batch is just a query on an index, and the result
can be sorted afterwards, though there are tricky issues getting the
right batch *size*.
Since we usually work with IFBTree data structures until the very end,
we get most of the benefits of batching. Once you are done with your
processing, you can simply wrap the result in a
zope.app.catalog.catalog.ResultSet (or similar) and be good to go.
This is why I think making the story for working directly with the
BTree data structures easier might be a good way to go.
But my batching depends on sorting. I.e. I want to batch through a
sorted list of results.
Sorting is hard to do efficiently, and easy to *think* you are making
an optimization. We are currently doing it "naively" (to the degree
that using the very efficient Python sort is naive), and Jim refers to
research that indicates that a good non-naive approach is not clear. I
can certainly imagine various approaches. Carefully arranging your
merges can actually result in a pre-sorted set. We're not being that
I use a very naive approach now too, but that means I have to do the
realization of all the objects into a ResultSet *before* the sort can
happen. Waking up all those objects just to sort them for each batch
feels wrong. I'm not being careful enough with my query operations
either to have a pre-sorted set.
Missing powerful query concepts
Certain powerful query concepts like joins, available in a relational
setting, are missing. I've already run into a scenario where I wanted
to someting like this: given a bunch of version objects with field
'id', where multiple objects can have the same 'id' to indicate
they're versions of the same object, I want all objects where field
'workflow_state' is 'PUBLISHED' unless there is another object with
the same id that have workflow_state 'NEW', in which case I want that
I think joins would be a way to solve it, though I haven't figured
out the details, nor how to implement them efficiently on top of the
catalog. This kind of thing is where a relational database makes life
a lot simpler.
I guess that's taste. I'd be happier with Python.
It's performance too, not just taste. I can solve this in Python, but it is:
* more, harder to read code.
* much much slower than it potentially could be.
I.e. now I have code that looks approximately like this:
"""Return newest versions of a particular object.
If there is a version that's PUBLISHED and a version that's NEW,
return NEW version only in results.
Multiple versions of the same object are identified with an id
state_index = 'document_catalog', 'workflow_state'
id_index = 'document_catalog', 'worflow_id'
query = InSet(state_index, [NEW, PUBLISHED])
q = zapi.getUtility(IExtendedQuery)
for version in q.searchResults(query):
s = version.getState()
if s == PUBLISHED:
id = version.getId()
query2 = And(InSet(state_index, [NEW]),
continue # skip this result, as there's a new version
the second, inner query is not very pleasant to do and I'd prefer to
avoid it by having a way to do a join on the workflow_id index.
Readability wise, I'd prefer to be able to write a single query instead
of a complicated loop.
I realize that one can always say: "You should've designed your
application differently to avoid this issue", but I think the general
pattern where you want to ask:
give me all objects with field A having state 1 unless there's
object somehow related to it through field B, that has field A state 2
is something that will appear in applications and that should have a
reasonably succint query representation with a fast answer. Relational
databases offer this power, but the Zope Catalog right now doesn't seem to.
Ease of deployment
While I agree with your general point, Ruby on Rails might call that
assertion into question a bit.
True, and so does Django and so on. I wonder whether they have as much
of a commons of shared components as Zope 2 does; I'm not familiar
enough with the projects to judge this.
I'll add another.
Because of the Zope 3 component system, if we can use the current
catalog interface, or invent another, to develop both a ZODB/BTree-
based implementation and an RDBMS-based implementation, it's possible
that users who wanted to choose the RDBMS strengths would be able to do
so without dividing the user community.
Yes, this is bringing the transparency a step further. Basically what
you'd be doing is building an object relational abstraction based on the
Zope catalog. :)
While the transparency has many benefits
mentioned before, the more straightforward mapping has the benefits
of simplicity, may map to relational databases more easily, and may
expose powerful relational features more straightforwardly.
It's true. I hope that an entire platform doesn't force the decision
on its potential users, though.
Zope3-dev mailing list