On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote:

I'm trying to understand if my idea of how to use the FilterExtent in zc.catalog (1.1.1) is correct (and efficient). I'm also using hurry.query (0.9.3). My current understanding of extents is: they can be used to perform a search on a subset of a catalog. For example, "give me all objects where attr1 is 'foo' but only for intids 5,6,7 and 10"

Short version:
I have an extent of a large catalog. How do I make a search within this extent?

Hi Jesper.

Extents have a primary use case in the zc.catalog package of defining the extent of a catalog--a set of indexes. This is more efficient both in terms of programmer time and computer time than filtering out objects per-index. It also allows asking indexes questions that would otherwise be impossible, e.g., "what objects do *not* match this particular search?", and a couple of others.

I'm not sure hurry.query leverages all aspects of extents, and indexes that know how to deal with them. I seem to recall that it didn't, but I could have been wrong and it was a while ago.

So, the primary use case is different than yours.

Extents can be used in the way that you describe--intersecting against a larger search of a larger catalog. What you described is a reasonable first cut, and a reasonable use of extents.

Depending on your use cases and the time available, you may want to explore optimizations. I wouldn't surprised if you eventually wanted to roll your own catalog to do the set operations in the ways that make the most sense for your application. A few quick thoughts:

- If your common extents are really as small as in your examples, one thing to realize is that the time for an intersection in BTree code pretty much always is determined by the size of the smaller set. Therefore, given three sets that need to be intersected (say, your extent and the result of the search of two indexes) of relative sizes Small, Medium, and Large, you want to intersect in this way: intersect(intersect(Small, Medium), Large). See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto for timeit fun, if you like.

- there are two primary costs of a big catalog, IMO/IME: write time and load time. If necessary for your app, consider ways to try to keep smaller catalogs (e.g., does the value of some information diminish over time? Does it make sense to have separate catalogs, divided across some boundary or boundaries?); and consider ways to keep the catalog in memory (in the object cache).

- if you typically only need the first X of a result set, doing something like Dieter Maurer's incremental search Zope 2 code would be interesting to research and might be appreciated by the community if it worked out well.

Finally IMO/IME, only pursue these sometimes risky optimizations if they are really necessary and if you have some pretty concrete research or knowledge (your own or others) to back up your plan. If I were you I'd just start out with the "do a search and then intersect with the extent" approach you mentioned, and only worry about it more when your app needs it.


Zope3-users mailing list

Reply via email to