Gary Poster wrote:

On Aug 24, 2005, at 6:27 AM, Martijn Faassen wrote:
Now as to where I see areas where features are lacking in the Zope 3 catalog:

Underfeatured query API
[snip query API discussion]

Another way of looking at this--or simply an additional feature on top of a query langauge--might be to make the IFBTree results easier to manipulate in an easier way. The code in the zc sandbox for the extent ( is a sketch of what I mean--following some of the set API, for instance. The reason for my interest in this is that we have very little code that uses the catalog to return objects--just IFBTree data structures. Just working with the IFBTree data structures gives you a lot more flexibility for integration of catalog results with other data structures.

My code works mostly with the IFBTree objects as well, though I'll have to check out your code to see what you mean exactly.

Casey Duncan had explored some very interesting ideas in his pypes project ( for a query language, by the way, but his ambition is still largely unrealized, even though much of his query language work could be ported to Zope 3 without a huge amount of trouble.

I need to take a look at this; I've seen the checkins but never quite got what this was about. :)

Arguably, query optimization would be the feature that would make a given syntax win.

Or at least a given AST; a syntax is not strictly necessary.

Fast, easy batching/sorting

I don't know how to do easy, efficient batching/sorting with the catalog. I'd like to be able to query *just* a batch of objects, sorted, for user interface purposes. There doesn't seem to be a straightforward way to do this yet, and this is a very common use case. The batching implementation sitting out there in zope.bugtracker.batching is nice, but doesn't deal with the catalog.

I think this should be fixable with a bit more infrastructure though. Getting the right batch is just a query on an index, and the result can be sorted afterwards, though there are tricky issues getting the right batch *size*.

Since we usually work with IFBTree data structures until the very end, we get most of the benefits of batching. Once you are done with your processing, you can simply wrap the result in a (or similar) and be good to go. This is why I think making the story for working directly with the BTree data structures easier might be a good way to go.

But my batching depends on sorting. I.e. I want to batch through a sorted list of results.

Sorting is hard to do efficiently, and easy to *think* you are making an optimization. We are currently doing it "naively" (to the degree that using the very efficient Python sort is naive), and Jim refers to research that indicates that a good non-naive approach is not clear. I can certainly imagine various approaches. Carefully arranging your merges can actually result in a pre-sorted set. We're not being that careful.

I use a very naive approach now too, but that means I have to do the realization of all the objects into a ResultSet *before* the sort can happen. Waking up all those objects just to sort them for each batch feels wrong. I'm not being careful enough with my query operations either to have a pre-sorted set.

Missing powerful query concepts

Certain powerful query concepts like joins, available in a relational setting, are missing. I've already run into a scenario where I wanted to someting like this: given a bunch of version objects with field 'id', where multiple objects can have the same 'id' to indicate they're versions of the same object, I want all objects where field 'workflow_state' is 'PUBLISHED' unless there is another object with the same id that have workflow_state 'NEW', in which case I want that one'.

I think joins would be a way to solve it, though I haven't figured out the details, nor how to implement them efficiently on top of the catalog. This kind of thing is where a relational database makes life a lot simpler.

I guess that's taste.  I'd be happier with Python.

It's performance too, not just taste. I can solve this in Python, but it is:

* more, harder to read code.

* much much slower than it potentially could be.

I.e. now I have code that looks approximately like this:

def newestVersions():
    """Return newest versions of a particular object.

    If there is a version that's PUBLISHED and a version that's NEW,
    return NEW version only in results.

    Multiple versions of the same object are identified with an id
    they share.
    state_index = 'document_catalog', 'workflow_state'
    id_index = 'document_catalog', 'worflow_id'
    query = InSet(state_index, [NEW, PUBLISHED])
    q = zapi.getUtility(IExtendedQuery)
    for version in q.searchResults(query):
        s = version.getState()
        if s == PUBLISHED:
            id = version.getId()
            query2 = And(InSet(state_index, [NEW]),
                         Equals(id_index, id))
            if q.searchResults(query2):
               continue # skip this result, as there's a new version
        yield version

the second, inner query is not very pleasant to do and I'd prefer to avoid it by having a way to do a join on the workflow_id index. Readability wise, I'd prefer to be able to write a single query instead of a complicated loop.

I realize that one can always say: "You should've designed your application differently to avoid this issue", but I think the general pattern where you want to ask:

give me all objects with field A having state 1 unless there's another
  object somehow related to it through field B, that has field A state 2

is something that will appear in applications and that should have a reasonably succint query representation with a fast answer. Relational databases offer this power, but the Zope Catalog right now doesn't seem to.


Ease of deployment

While I agree with your general point, Ruby on Rails might call that assertion into question a bit.

True, and so does Django and so on. I wonder whether they have as much of a commons of shared components as Zope 2 does; I'm not familiar enough with the projects to judge this.

Good points.

I'll add another.

Component system

Because of the Zope 3 component system, if we can use the current catalog interface, or invent another, to develop both a ZODB/BTree- based implementation and an RDBMS-based implementation, it's possible that users who wanted to choose the RDBMS strengths would be able to do so without dividing the user community.

Yes, this is bringing the transparency a step further. Basically what you'd be doing is building an object relational abstraction based on the Zope catalog. :)

While the transparency has many benefits mentioned before, the more straightforward mapping has the benefits of simplicity, may map to relational databases more easily, and may expose powerful relational features more straightforwardly.

It's true. I hope that an entire platform doesn't force the decision on its potential users, though.



Zope3-dev mailing list

Reply via email to