Re: [Zope3-dev] catalog 'all documents' abstraction

Gary Poster Tue, 30 Aug 2005 11:25:44 -0700


On Aug 30, 2005, at 1:57 PM, Martijn Faassen wrote:

Jim Fulton wrote:
Martijn Faassen wrote:
[snip]
 > ). I also think however that it's the wrong
place the ask for this information, as this doesn't work with theextentcatalog.
Well, it depends on what you meant by "indexed" above. Differentindexesindex different objects. The extent catalog tried to define anextent for
which it makes sense to apply different (not) operations.
And is the idea that multiple extent catalogs can share an extent?


They could.  We haven't needed that yet.

(By the way you say 'the extent catalog tried', does this meansomething else is being considered now?)

Not to my knowledge, and I just asked Jim, and he said there was nospecial significance to the past tense. :-)

The catalog itself seems like the wrong place to ask as wellhowever, as things would get hairy in the case of a query overmultiple catalogs -- which catalog would be asked for all idsthat are indexed?
In general, I'd like the catalog to remain fairly small and freeof logic.I wanted to say this in the other thread you started oncataloging, but
didn't get to it.  Ideally, a catalog wouldn't have any query logic
at all. People should be able to invovate on query algorithmswithout
affecting catalogs.
This is already clear; I've been trying to do so in the project I'mworking on, though I'm more focusing on a sensible query language(well, of Python objects) than performance algorithms.
At the same time I believe Zope 3 *does* need query systems builtin eventually. While it's fine to allow people to design their ownquery languages and algorithms, not everybody is able to do this,and those who are able to don't always want to. Even if they did, Idon't want us to end up with 5 different query systems either.
So, while I agree that a query language in the core should notexclude someone from building something better, I do believe acatalog query language package is needed in the core.
(To be absolutely clear: I also think the RDF avenues beingexplored are very interesting, and I don't want to imply that thisis not an interesting direction, but I do think we need somethingfor the plain catalog too)

This all makes sense to me, btw (as is probably clear by my RDFmessages). Query language arguments have been persuasive to me.That said, I still don't find the lack of a query language to be animpediment. It's a nice-to-have for me and arguably an essential-to-have to support a larger audience.

Hm, perhaps this isn't ideal either, as this would get hairy incase of a query that spans multiple catalogs -- which catalogwill be asked in that case for a list of all documents?
I think in the particular case of "not", you have to have animplicit orexplicit set that you are subtracting something from. The "right"set is
application specific.  In any case, I think the query logic should be
in separate query components.
I agree that the catalog should remain nice and simple.
That said, catalogs right now already have an implicit concept of'everything indexed', which for instance is already used for re-indexing, it's just not made available to someone who wants tobuild something on it.
The extent catalog makes this more explicit by defining an extent,so perhaps this is the way to go. The extent could be a queryparameter to help the query engine figure out what to do in case of'not'. For simple use cases, the extent can be constructed from theintid utility, perhaps.
It would be helpful if someone could explain the motivations behindthe extent catalog, by the way -- this information seems to bemissing in zc.catalog. Am I at all on the right track with mythinking on it?

It should be pointed out initially that the son-of-queued-catalogcode doesn't have anything to do with extents. I think Jim wantsthat factored out when we have time so that can be a mix-and-matchcapability. I think you are asking about extents themselves, though.


We had three use cases that led us to extents.

First, we wanted several catalogs that only indexed certain differentthings. This could have been done by subscribers, so this wasn'tterribly compelling by itself.

Second, we wanted to transparently support queries that mergedresults across catalog-like data structures. The catalog defined theitems we wanted to search through, while some of the other datastructures kept track of a larger set of objects (subsuming the setthat the catalog cared about). Sometimes, users could perform aquery that didn't actually use any of the catalog's data structures,but that should be filtered by the set of the catalog's objects--itsextent.

Third, we wanted to let our indexes data be usable for NOT queries.In order to do that, we needed an IFBTree structure that describesthe complete set for a given catalog, so that a contained index cansimply (and reasonably efficiently) subtract the query result fromthe full set. The indexes in zc.catalog also use extents for someother similar tricks.

An extent that accepts all objects would effectively be the datastructure you want, as I understand it. It is actually (at leasttypically for us) different than the intid mapping because there areseveral classes of things that have intids that are not cataloged.If more than one catalog all index the same objects, I'd first wonderwhy the indexes were not all in the same catalog; I'd second say thatyes, they probably could share a filter-less extent.

If we want any of zc.catalog in the Zope core, each component willcertainly need a proposal, by the way: we're offering this as codethat has helped us out and that we think might help others, eitherdirectly or as ideas, so we are not duplicating effort. We're notproclaiming it to necessarily be our next core step.


Gary
_______________________________________________
Zope3-dev mailing list
[email protected]
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Re: [Zope3-dev] catalog 'all documents' abstraction

Reply via email to