On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote:
I'm trying to understand if my idea of how to use the FilterExtent
in zc.catalog (1.1.1) is
correct (and efficient). I'm also using hurry.query (0.9.3). My
current understanding of
extents is: they can be used to perform a search on a subset of a
catalog. For example,
"give me all objects where attr1 is 'foo' but only for intids 5,6,7
I have an extent of a large catalog. How do I make a search within
Extents have a primary use case in the zc.catalog package of defining
the extent of a catalog--a set of indexes. This is more efficient
both in terms of programmer time and computer time than filtering out
objects per-index. It also allows asking indexes questions that would
otherwise be impossible, e.g., "what objects do *not* match this
particular search?", and a couple of others.
I'm not sure hurry.query leverages all aspects of extents, and indexes
that know how to deal with them. I seem to recall that it didn't, but
I could have been wrong and it was a while ago.
So, the primary use case is different than yours.
Extents can be used in the way that you describe--intersecting against
a larger search of a larger catalog. What you described is a
reasonable first cut, and a reasonable use of extents.
Depending on your use cases and the time available, you may want to
explore optimizations. I wouldn't surprised if you eventually wanted
to roll your own catalog to do the set operations in the ways that
make the most sense for your application. A few quick thoughts:
- If your common extents are really as small as in your examples, one
thing to realize is that the time for an intersection in BTree code
pretty much always is determined by the size of the smaller set.
Therefore, given three sets that need to be intersected (say, your
extent and the result of the search of two indexes) of relative sizes
Small, Medium, and Large, you want to intersect in this way:
intersect(intersect(Small, Medium), Large). See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto
for timeit fun, if you like.
- there are two primary costs of a big catalog, IMO/IME: write time
and load time. If necessary for your app, consider ways to try to
keep smaller catalogs (e.g., does the value of some information
diminish over time? Does it make sense to have separate catalogs,
divided across some boundary or boundaries?); and consider ways to
keep the catalog in memory (in the object cache).
- if you typically only need the first X of a result set, doing
something like Dieter Maurer's incremental search Zope 2 code would be
interesting to research and might be appreciated by the community if
it worked out well.
Finally IMO/IME, only pursue these sometimes risky optimizations if
they are really necessary and if you have some pretty concrete
research or knowledge (your own or others) to back up your plan. If I
were you I'd just start out with the "do a search and then intersect
with the extent" approach you mentioned, and only worry about it more
when your app needs it.
Zope3-users mailing list