Hi Gary,Thanks for your comprehensive answer. Yes, my extents aren't really
small as in my examples. Seems like a reasonable idea to wait with
optimizations, not sure they are even needed, at least not within a year
or so :)


On 10/28/07, Gary Poster <[EMAIL PROTECTED]> wrote:
> Hi Jesper.
> Extents have a primary use case in the zc.catalog package of defining
> the extent of a catalog--a set of indexes.  This is more efficient
> both in terms of programmer time and computer time than filtering out
> objects per-index.  It also allows asking indexes questions that would
> otherwise be impossible, e.g., "what objects do *not* match this
> particular search?", and a couple of others.
> I'm not sure hurry.query leverages all aspects of extents, and indexes
> that know how to deal with them.  I seem to recall that it didn't, but
> I could have been wrong and it was a while ago.
> So, the primary use case is different than yours.
> Extents can be used in the way that you describe--intersecting against
> a larger search of a larger catalog.  What you described is a
> reasonable first cut, and a reasonable use of extents.
> Depending on your use cases and the time available, you may want to
> explore optimizations.  I wouldn't surprised if you eventually wanted
> to roll your own catalog to do the set operations in the ways that
> make the most sense for your application.  A few quick thoughts:
> - If your common extents are really as small as in your examples, one
> thing to realize is that the time for an intersection in BTree code
> pretty much always is determined by the size of the smaller set.
> Therefore, given three sets that need to be intersected (say, your
> extent and the result of the search of two indexes) of relative sizes
> Small, Medium, and Large, you want to intersect in this way:
> intersect(intersect(Small, Medium), Large).  See
> http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto
>   for timeit fun, if you like.
> - there are two primary costs of a big catalog, IMO/IME: write time
> and load time.  If necessary for your app, consider ways to try to
> keep smaller catalogs (e.g., does the value of some information
> diminish over time?  Does it make sense to have separate catalogs,
> divided across some boundary or boundaries?); and consider ways to
> keep the catalog in memory (in the object cache).
> - if you typically only need the first X of a result set, doing
> something like Dieter Maurer's incremental search Zope 2 code would be
> interesting to research and might be appreciated by the community if
> it worked out well.
> Finally IMO/IME, only pursue these sometimes risky optimizations if
> they are really necessary and if you have some pretty concrete
> research or knowledge (your own or others) to back up your plan.  If I
> were you I'd just start out with the "do a search and then intersect
> with the extent" approach you mentioned, and only worry about it more
> when your app needs it.
> Gary
Zope3-users mailing list

Reply via email to