On Aug 30, 2005, at 1:57 PM, Martijn Faassen wrote:

Jim Fulton wrote:

Martijn Faassen wrote:


 > ). I also think however that it's the wrong

place the ask for this information, as this doesn't work with the extentcatalog.

Well, it depends on what you meant by "indexed" above. Different indexes index different objects. The extent catalog tried to define an extent for
which it makes sense to apply different (not) operations.

And is the idea that multiple extent catalogs can share an extent?

They could.  We haven't needed that yet.

(By the way you say 'the extent catalog tried', does this mean something else is being considered now?)

Not to my knowledge, and I just asked Jim, and he said there was no special significance to the past tense. :-)

The catalog itself seems like the wrong place to ask as well however, as things would get hairy in the case of a query over multiple catalogs -- which catalog would be asked for all ids that are indexed?

In general, I'd like the catalog to remain fairly small and free of logic. I wanted to say this in the other thread you started on cataloging, but
didn't get to it.  Ideally, a catalog wouldn't have any query logic
at all. People should be able to invovate on query algorithms without
affecting catalogs.

This is already clear; I've been trying to do so in the project I'm working on, though I'm more focusing on a sensible query language (well, of Python objects) than performance algorithms.

At the same time I believe Zope 3 *does* need query systems built in eventually. While it's fine to allow people to design their own query languages and algorithms, not everybody is able to do this, and those who are able to don't always want to. Even if they did, I don't want us to end up with 5 different query systems either.

So, while I agree that a query language in the core should not exclude someone from building something better, I do believe a catalog query language package is needed in the core.

(To be absolutely clear: I also think the RDF avenues being explored are very interesting, and I don't want to imply that this is not an interesting direction, but I do think we need something for the plain catalog too)

This all makes sense to me, btw (as is probably clear by my RDF messages). Query language arguments have been persuasive to me. That said, I still don't find the lack of a query language to be an impediment. It's a nice-to-have for me and arguably an essential-to- have to support a larger audience.

Hm, perhaps this isn't ideal either, as this would get hairy in case of a query that spans multiple catalogs -- which catalog will be asked in that case for a list of all documents?

I think in the particular case of "not", you have to have an implicit or explicit set that you are subtracting something from. The "right" set is
application specific.  In any case, I think the query logic should be
in separate query components.

I agree that the catalog should remain nice and simple.

That said, catalogs right now already have an implicit concept of 'everything indexed', which for instance is already used for re- indexing, it's just not made available to someone who wants to build something on it.

The extent catalog makes this more explicit by defining an extent, so perhaps this is the way to go. The extent could be a query parameter to help the query engine figure out what to do in case of 'not'. For simple use cases, the extent can be constructed from the intid utility, perhaps.

It would be helpful if someone could explain the motivations behind the extent catalog, by the way -- this information seems to be missing in zc.catalog. Am I at all on the right track with my thinking on it?

It should be pointed out initially that the son-of-queued-catalog code doesn't have anything to do with extents. I think Jim wants that factored out when we have time so that can be a mix-and-match capability. I think you are asking about extents themselves, though.

We had three use cases that led us to extents.

First, we wanted several catalogs that only indexed certain different things. This could have been done by subscribers, so this wasn't terribly compelling by itself.

Second, we wanted to transparently support queries that merged results across catalog-like data structures. The catalog defined the items we wanted to search through, while some of the other data structures kept track of a larger set of objects (subsuming the set that the catalog cared about). Sometimes, users could perform a query that didn't actually use any of the catalog's data structures, but that should be filtered by the set of the catalog's objects--its extent.

Third, we wanted to let our indexes data be usable for NOT queries. In order to do that, we needed an IFBTree structure that describes the complete set for a given catalog, so that a contained index can simply (and reasonably efficiently) subtract the query result from the full set. The indexes in zc.catalog also use extents for some other similar tricks.

An extent that accepts all objects would effectively be the data structure you want, as I understand it. It is actually (at least typically for us) different than the intid mapping because there are several classes of things that have intids that are not cataloged. If more than one catalog all index the same objects, I'd first wonder why the indexes were not all in the same catalog; I'd second say that yes, they probably could share a filter-less extent.

If we want any of zc.catalog in the Zope core, each component will certainly need a proposal, by the way: we're offering this as code that has helped us out and that we think might help others, either directly or as ideas, so we are not duplicating effort. We're not proclaiming it to necessarily be our next core step.

Zope3-dev mailing list
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to