Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

Gary Poster Sun, 28 Oct 2007 11:14:18 -0800


On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote:

Hey!
I'm trying to understand if my idea of how to use the FilterExtentin zc.catalog (1.1.1) iscorrect (and efficient). I'm also using hurry.query (0.9.3). Mycurrent understanding ofextents is: they can be used to perform a search on a subset of acatalog. For example,"give me all objects where attr1 is 'foo' but only for intids 5,6,7and 10"
Short version:
I have an extent of a large catalog. How do I make a search withinthis extent?


Hi Jesper.

Extents have a primary use case in the zc.catalog package of definingthe extent of a catalog--a set of indexes. This is more efficientboth in terms of programmer time and computer time than filtering outobjects per-index. It also allows asking indexes questions that wouldotherwise be impossible, e.g., "what objects do *not* match thisparticular search?", and a couple of others.

I'm not sure hurry.query leverages all aspects of extents, and indexesthat know how to deal with them. I seem to recall that it didn't, butI could have been wrong and it was a while ago.


So, the primary use case is different than yours.

Extents can be used in the way that you describe--intersecting againsta larger search of a larger catalog. What you described is areasonable first cut, and a reasonable use of extents.

Depending on your use cases and the time available, you may want toexplore optimizations. I wouldn't surprised if you eventually wantedto roll your own catalog to do the set operations in the ways thatmake the most sense for your application. A few quick thoughts:

- If your common extents are really as small as in your examples, onething to realize is that the time for an intersection in BTree codepretty much always is determined by the size of the smaller set.Therefore, given three sets that need to be intersected (say, yourextent and the result of the search of two indexes) of relative sizesSmall, Medium, and Large, you want to intersect in this way:intersect(intersect(Small, Medium), Large). See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=autofor timeit fun, if you like.

- there are two primary costs of a big catalog, IMO/IME: write timeand load time. If necessary for your app, consider ways to try tokeep smaller catalogs (e.g., does the value of some informationdiminish over time? Does it make sense to have separate catalogs,divided across some boundary or boundaries?); and consider ways tokeep the catalog in memory (in the object cache).

- if you typically only need the first X of a result set, doingsomething like Dieter Maurer's incremental search Zope 2 code would beinteresting to research and might be appreciated by the community ifit worked out well.

Finally IMO/IME, only pursue these sometimes risky optimizations ifthey are really necessary and if you have some pretty concreteresearch or knowledge (your own or others) to back up your plan. If Iwere you I'd just start out with the "do a search and then intersectwith the extent" approach you mentioned, and only worry about it morewhen your app needs it.


HTH

Gary
_______________________________________________
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users

Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

Reply via email to