On Aug 30, 2005, at 1:57 PM, Martijn Faassen wrote:
Jim Fulton wrote:
Martijn Faassen wrote:
> ). I also think however that it's the wrong
place the ask for this information, as this doesn't work with the
Well, it depends on what you meant by "indexed" above. Different
index different objects. The extent catalog tried to define an
which it makes sense to apply different (not) operations.
And is the idea that multiple extent catalogs can share an extent?
They could. We haven't needed that yet.
(By the way you say 'the extent catalog tried', does this mean
something else is being considered now?)
Not to my knowledge, and I just asked Jim, and he said there was no
special significance to the past tense. :-)
The catalog itself seems like the wrong place to ask as well
however, as things would get hairy in the case of a query over
multiple catalogs -- which catalog would be asked for all ids
that are indexed?
In general, I'd like the catalog to remain fairly small and free
I wanted to say this in the other thread you started on
didn't get to it. Ideally, a catalog wouldn't have any query logic
at all. People should be able to invovate on query algorithms
This is already clear; I've been trying to do so in the project I'm
working on, though I'm more focusing on a sensible query language
(well, of Python objects) than performance algorithms.
At the same time I believe Zope 3 *does* need query systems built
in eventually. While it's fine to allow people to design their own
query languages and algorithms, not everybody is able to do this,
and those who are able to don't always want to. Even if they did, I
don't want us to end up with 5 different query systems either.
So, while I agree that a query language in the core should not
exclude someone from building something better, I do believe a
catalog query language package is needed in the core.
(To be absolutely clear: I also think the RDF avenues being
explored are very interesting, and I don't want to imply that this
is not an interesting direction, but I do think we need something
for the plain catalog too)
This all makes sense to me, btw (as is probably clear by my RDF
messages). Query language arguments have been persuasive to me.
That said, I still don't find the lack of a query language to be an
impediment. It's a nice-to-have for me and arguably an essential-to-
have to support a larger audience.
Hm, perhaps this isn't ideal either, as this would get hairy in
case of a query that spans multiple catalogs -- which catalog
will be asked in that case for a list of all documents?
I think in the particular case of "not", you have to have an
explicit set that you are subtracting something from. The "right"
application specific. In any case, I think the query logic should be
in separate query components.
I agree that the catalog should remain nice and simple.
That said, catalogs right now already have an implicit concept of
'everything indexed', which for instance is already used for re-
indexing, it's just not made available to someone who wants to
build something on it.
The extent catalog makes this more explicit by defining an extent,
so perhaps this is the way to go. The extent could be a query
parameter to help the query engine figure out what to do in case of
'not'. For simple use cases, the extent can be constructed from the
intid utility, perhaps.
It would be helpful if someone could explain the motivations behind
the extent catalog, by the way -- this information seems to be
missing in zc.catalog. Am I at all on the right track with my
thinking on it?
It should be pointed out initially that the son-of-queued-catalog
code doesn't have anything to do with extents. I think Jim wants
that factored out when we have time so that can be a mix-and-match
capability. I think you are asking about extents themselves, though.
We had three use cases that led us to extents.
First, we wanted several catalogs that only indexed certain different
things. This could have been done by subscribers, so this wasn't
terribly compelling by itself.
Second, we wanted to transparently support queries that merged
results across catalog-like data structures. The catalog defined the
items we wanted to search through, while some of the other data
structures kept track of a larger set of objects (subsuming the set
that the catalog cared about). Sometimes, users could perform a
query that didn't actually use any of the catalog's data structures,
but that should be filtered by the set of the catalog's objects--its
Third, we wanted to let our indexes data be usable for NOT queries.
In order to do that, we needed an IFBTree structure that describes
the complete set for a given catalog, so that a contained index can
simply (and reasonably efficiently) subtract the query result from
the full set. The indexes in zc.catalog also use extents for some
other similar tricks.
An extent that accepts all objects would effectively be the data
structure you want, as I understand it. It is actually (at least
typically for us) different than the intid mapping because there are
several classes of things that have intids that are not cataloged.
If more than one catalog all index the same objects, I'd first wonder
why the indexes were not all in the same catalog; I'd second say that
yes, they probably could share a filter-less extent.
If we want any of zc.catalog in the Zope core, each component will
certainly need a proposal, by the way: we're offering this as code
that has helped us out and that we think might help others, either
directly or as ideas, so we are not duplicating effort. We're not
proclaiming it to necessarily be our next core step.
Zope3-dev mailing list