Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

2007-10-30 Thread Jesper Petersen
Hi Gary,Thanks for your comprehensive answer. Yes, my extents aren't really
as
small as in my examples. Seems like a reasonable idea to wait with
optimizations, not sure they are even needed, at least not within a year
or so :)

Cheers

On 10/28/07, Gary Poster [EMAIL PROTECTED] wrote:

 Hi Jesper.

 Extents have a primary use case in the zc.catalog package of defining
 the extent of a catalog--a set of indexes.  This is more efficient
 both in terms of programmer time and computer time than filtering out
 objects per-index.  It also allows asking indexes questions that would
 otherwise be impossible, e.g., what objects do *not* match this
 particular search?, and a couple of others.

 I'm not sure hurry.query leverages all aspects of extents, and indexes
 that know how to deal with them.  I seem to recall that it didn't, but
 I could have been wrong and it was a while ago.

 So, the primary use case is different than yours.

 Extents can be used in the way that you describe--intersecting against
 a larger search of a larger catalog.  What you described is a
 reasonable first cut, and a reasonable use of extents.

 Depending on your use cases and the time available, you may want to
 explore optimizations.  I wouldn't surprised if you eventually wanted
 to roll your own catalog to do the set operations in the ways that
 make the most sense for your application.  A few quick thoughts:

 - If your common extents are really as small as in your examples, one
 thing to realize is that the time for an intersection in BTree code
 pretty much always is determined by the size of the smaller set.
 Therefore, given three sets that need to be intersected (say, your
 extent and the result of the search of two indexes) of relative sizes
 Small, Medium, and Large, you want to intersect in this way:
 intersect(intersect(Small, Medium), Large).  See
 http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto
   for timeit fun, if you like.

 - there are two primary costs of a big catalog, IMO/IME: write time
 and load time.  If necessary for your app, consider ways to try to
 keep smaller catalogs (e.g., does the value of some information
 diminish over time?  Does it make sense to have separate catalogs,
 divided across some boundary or boundaries?); and consider ways to
 keep the catalog in memory (in the object cache).

 - if you typically only need the first X of a result set, doing
 something like Dieter Maurer's incremental search Zope 2 code would be
 interesting to research and might be appreciated by the community if
 it worked out well.

 Finally IMO/IME, only pursue these sometimes risky optimizations if
 they are really necessary and if you have some pretty concrete
 research or knowledge (your own or others) to back up your plan.  If I
 were you I'd just start out with the do a search and then intersect
 with the extent approach you mentioned, and only worry about it more
 when your app needs it.

 HTH

 Gary

___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users


Re: [Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

2007-10-28 Thread Gary Poster


On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote:


Hey!
I'm trying to understand if my idea of how to use the FilterExtent  
in zc.catalog (1.1.1) is
correct (and efficient). I'm also using hurry.query (0.9.3). My  
current understanding of
extents is: they can be used to perform a search on a subset of a  
catalog. For example,
give me all objects where attr1 is 'foo' but only for intids 5,6,7  
and 10



Short version:
I have an extent of a large catalog. How do I make a search within  
this extent?


Hi Jesper.

Extents have a primary use case in the zc.catalog package of defining  
the extent of a catalog--a set of indexes.  This is more efficient  
both in terms of programmer time and computer time than filtering out  
objects per-index.  It also allows asking indexes questions that would  
otherwise be impossible, e.g., what objects do *not* match this  
particular search?, and a couple of others.


I'm not sure hurry.query leverages all aspects of extents, and indexes  
that know how to deal with them.  I seem to recall that it didn't, but  
I could have been wrong and it was a while ago.


So, the primary use case is different than yours.

Extents can be used in the way that you describe--intersecting against  
a larger search of a larger catalog.  What you described is a  
reasonable first cut, and a reasonable use of extents.


Depending on your use cases and the time available, you may want to  
explore optimizations.  I wouldn't surprised if you eventually wanted  
to roll your own catalog to do the set operations in the ways that  
make the most sense for your application.  A few quick thoughts:


- If your common extents are really as small as in your examples, one  
thing to realize is that the time for an intersection in BTree code  
pretty much always is determined by the size of the smaller set.   
Therefore, given three sets that need to be intersected (say, your  
extent and the result of the search of two indexes) of relative sizes  
Small, Medium, and Large, you want to intersect in this way:  
intersect(intersect(Small, Medium), Large).  See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto 
 for timeit fun, if you like.


- there are two primary costs of a big catalog, IMO/IME: write time  
and load time.  If necessary for your app, consider ways to try to  
keep smaller catalogs (e.g., does the value of some information  
diminish over time?  Does it make sense to have separate catalogs,  
divided across some boundary or boundaries?); and consider ways to  
keep the catalog in memory (in the object cache).


- if you typically only need the first X of a result set, doing  
something like Dieter Maurer's incremental search Zope 2 code would be  
interesting to research and might be appreciated by the community if  
it worked out well.


Finally IMO/IME, only pursue these sometimes risky optimizations if  
they are really necessary and if you have some pretty concrete  
research or knowledge (your own or others) to back up your plan.  If I  
were you I'd just start out with the do a search and then intersect  
with the extent approach you mentioned, and only worry about it more  
when your app needs it.


HTH

Gary
___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users


[Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

2007-10-27 Thread Jesper Petersen
Hey!I'm trying to understand if my idea of how to use the FilterExtent in
zc.catalog (1.1.1) is
correct (and efficient). I'm also using hurry.query (0.9.3). My current
understanding of
extents is: they can be used to perform a search on a subset of a catalog.
For example,
give me all objects where attr1 is 'foo' but only for intids 5,6,7 and 10


Short version:
I have an extent of a large catalog. How do I make a search within this
extent?




Long version:
Basically, I have a few content components that can belong to several
networks.


My idea is to have an extent in each network. Adding a component to several
networks is simply a matter of subscribing to an added event, looking at the
object's
networks attribute and add it to each of the specified networks via the
network's
extent.add method.


 list(BigCatalog.refs )
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
 list(network1.extent.set)
[1, 2, 5]
 list(network2.extent.set)
[2, 3, 5, 6, 11]


In turn, the members of my site can join these networks so I'd want an
extent on
each principal too. This is to allow a kind of subscription to his
networks. Also I'd want
some filter capabilities where the user can somehow change settings for the
filter
function, but I'll leave it for now..
Say, we have a member John, who's member of network1  2, his extent would
then
look like this (assuming he's not filtering anything from the networks):


 IUserExtent(john).extent.set
[1, 2, 3, 5, 6, 11]


SO, my main question now is, uhm, how do I actually make a search within a
network's
extent? I mean, I could do a search on the big catalog and then make a union
with the
network's extent:


 some_result  network1.extent
[1]


...But was it really meant to be used like this? I see you can set the
parent catalog on
the extent but I'm still not sure how to use it.


Regards
Jesper
___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users