Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)
Hi Gary,Thanks for your comprehensive answer. Yes, my extents aren't really as small as in my examples. Seems like a reasonable idea to wait with optimizations, not sure they are even needed, at least not within a year or so :) Cheers On 10/28/07, Gary Poster [EMAIL PROTECTED] wrote: Hi Jesper. Extents have a primary use case in the zc.catalog package of defining the extent of a catalog--a set of indexes. This is more efficient both in terms of programmer time and computer time than filtering out objects per-index. It also allows asking indexes questions that would otherwise be impossible, e.g., what objects do *not* match this particular search?, and a couple of others. I'm not sure hurry.query leverages all aspects of extents, and indexes that know how to deal with them. I seem to recall that it didn't, but I could have been wrong and it was a while ago. So, the primary use case is different than yours. Extents can be used in the way that you describe--intersecting against a larger search of a larger catalog. What you described is a reasonable first cut, and a reasonable use of extents. Depending on your use cases and the time available, you may want to explore optimizations. I wouldn't surprised if you eventually wanted to roll your own catalog to do the set operations in the ways that make the most sense for your application. A few quick thoughts: - If your common extents are really as small as in your examples, one thing to realize is that the time for an intersection in BTree code pretty much always is determined by the size of the smaller set. Therefore, given three sets that need to be intersected (say, your extent and the result of the search of two indexes) of relative sizes Small, Medium, and Large, you want to intersect in this way: intersect(intersect(Small, Medium), Large). See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto for timeit fun, if you like. - there are two primary costs of a big catalog, IMO/IME: write time and load time. If necessary for your app, consider ways to try to keep smaller catalogs (e.g., does the value of some information diminish over time? Does it make sense to have separate catalogs, divided across some boundary or boundaries?); and consider ways to keep the catalog in memory (in the object cache). - if you typically only need the first X of a result set, doing something like Dieter Maurer's incremental search Zope 2 code would be interesting to research and might be appreciated by the community if it worked out well. Finally IMO/IME, only pursue these sometimes risky optimizations if they are really necessary and if you have some pretty concrete research or knowledge (your own or others) to back up your plan. If I were you I'd just start out with the do a search and then intersect with the extent approach you mentioned, and only worry about it more when your app needs it. HTH Gary ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users
Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)
On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote: Hey! I'm trying to understand if my idea of how to use the FilterExtent in zc.catalog (1.1.1) is correct (and efficient). I'm also using hurry.query (0.9.3). My current understanding of extents is: they can be used to perform a search on a subset of a catalog. For example, give me all objects where attr1 is 'foo' but only for intids 5,6,7 and 10 Short version: I have an extent of a large catalog. How do I make a search within this extent? Hi Jesper. Extents have a primary use case in the zc.catalog package of defining the extent of a catalog--a set of indexes. This is more efficient both in terms of programmer time and computer time than filtering out objects per-index. It also allows asking indexes questions that would otherwise be impossible, e.g., what objects do *not* match this particular search?, and a couple of others. I'm not sure hurry.query leverages all aspects of extents, and indexes that know how to deal with them. I seem to recall that it didn't, but I could have been wrong and it was a while ago. So, the primary use case is different than yours. Extents can be used in the way that you describe--intersecting against a larger search of a larger catalog. What you described is a reasonable first cut, and a reasonable use of extents. Depending on your use cases and the time available, you may want to explore optimizations. I wouldn't surprised if you eventually wanted to roll your own catalog to do the set operations in the ways that make the most sense for your application. A few quick thoughts: - If your common extents are really as small as in your examples, one thing to realize is that the time for an intersection in BTree code pretty much always is determined by the size of the smaller set. Therefore, given three sets that need to be intersected (say, your extent and the result of the search of two indexes) of relative sizes Small, Medium, and Large, you want to intersect in this way: intersect(intersect(Small, Medium), Large). See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto for timeit fun, if you like. - there are two primary costs of a big catalog, IMO/IME: write time and load time. If necessary for your app, consider ways to try to keep smaller catalogs (e.g., does the value of some information diminish over time? Does it make sense to have separate catalogs, divided across some boundary or boundaries?); and consider ways to keep the catalog in memory (in the object cache). - if you typically only need the first X of a result set, doing something like Dieter Maurer's incremental search Zope 2 code would be interesting to research and might be appreciated by the community if it worked out well. Finally IMO/IME, only pursue these sometimes risky optimizations if they are really necessary and if you have some pretty concrete research or knowledge (your own or others) to back up your plan. If I were you I'd just start out with the do a search and then intersect with the extent approach you mentioned, and only worry about it more when your app needs it. HTH Gary ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users
[Zope3-Users] zc.catalog's FilterExtent (with hurry.query)
Hey!I'm trying to understand if my idea of how to use the FilterExtent in zc.catalog (1.1.1) is correct (and efficient). I'm also using hurry.query (0.9.3). My current understanding of extents is: they can be used to perform a search on a subset of a catalog. For example, give me all objects where attr1 is 'foo' but only for intids 5,6,7 and 10 Short version: I have an extent of a large catalog. How do I make a search within this extent? Long version: Basically, I have a few content components that can belong to several networks. My idea is to have an extent in each network. Adding a component to several networks is simply a matter of subscribing to an added event, looking at the object's networks attribute and add it to each of the specified networks via the network's extent.add method. list(BigCatalog.refs ) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] list(network1.extent.set) [1, 2, 5] list(network2.extent.set) [2, 3, 5, 6, 11] In turn, the members of my site can join these networks so I'd want an extent on each principal too. This is to allow a kind of subscription to his networks. Also I'd want some filter capabilities where the user can somehow change settings for the filter function, but I'll leave it for now.. Say, we have a member John, who's member of network1 2, his extent would then look like this (assuming he's not filtering anything from the networks): IUserExtent(john).extent.set [1, 2, 3, 5, 6, 11] SO, my main question now is, uhm, how do I actually make a search within a network's extent? I mean, I could do a search on the big catalog and then make a union with the network's extent: some_result network1.extent [1] ...But was it really meant to be used like this? I see you can set the parent catalog on the extent but I'm still not sure how to use it. Regards Jesper ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users