Dear gods, I hope you get an answer to this question, as I've noticed the same 
thing with very large indexes (using zc.catalog). I believe that at the root 
layers repoze.catalog is built around the same concepts and structures.

When I tried to trace down the problems with a profiler, they all revolved 
around loading the relevant portions of the indexes into memory. It had nothing 
to do with the final results of the query; had nothing to do with waking up the 
'result' objects; all of the slowness seemed to be in loading the indexes 
themselves into memory. In one case, we were only using one index (a SetIndex), 
with about 150000 document ids.

This is from my own profiling. All I know is that this is very slow, and then 
very fast, and the object cache and the relevant indexes ability to keep all of 
their little BTrees or Buckets or Sets or whatever in that object cache seem to 
have a tremendous impact on query and response time - far more than is taken up 
by then waking up the content objects in your result set.

When the indexes aren't in memory, in my case I found the slowness to be in 
BTrees's 'multiunion' function; the real slowness was in calling ZODB's 
setstate (which is loading into memory). This is just BTree (catalog index) 
data being loaded at this point:


Profiling a fresh site (no object cache / memory population yet)
================================================================
winterhome-firstload.profile% callees multiunion
Function                      called...
                                  ncalls  tottime  cumtime
{BTrees._IFBTree.multiunion}  ->   65980    0.132   57.891  
Connection.py:848(setstate)

winterhome-firstload.profile% callers multiunion
Function                      was called by...
                                  ncalls  tottime  cumtime
{BTrees._IFBTree.multiunion}  <-      27    0.348   58.239  index.py:203(apply)

(yep, 58 seconds; very slow ZEO network load in a demostorage setup where ZEO 
cannot update its client cache, which makes these setstate problems very 
exaggerated). 'multiunion' is called 27 times, but one of those times takes 58 
seconds).


Profiling the same page again with everything all loaded
========================================================
winterhome-secondload.profile% callees multiunion
Function                      called...
                                  ncalls  tottime  cumtime
{BTrees._IFBTree.multiunion}  ->

winterhome-secondload.profile% callers multiunion
Function                      was called by...
                                  ncalls  tottime  cumtime
{BTrees._IFBTree.multiunion}  <-      27    0.193    0.193  index.py:203(apply)

(this time, multiunion doesn't require any ZODB loads, and its 27 calls 
internal time and cumulative time are relatively speedy)

If there's a good strategy for getting and keeping these things in memory, I'd 
love to know it; but when the catalog indexes are competing with all of the 
content objects that make up a site, it's hard to know what to do or even how 
to configure the object cache counts well without running into serious memory 
problems.

On Jan 3, 2013, at 2:50 PM, Claudiu Saftoiu <csaft...@gmail.com> wrote:

> Hello all,
> 
> Am I doing something wrong with my queries, or is repoze.catalog.query very 
> slow?
> 
> I have a `Catalog` with ~320,000 objects and 17 `CatalogFieldIndex`es. All 
> the objects are indexed and up to date. This is the query I ran (field names 
> renamed):
> 
>     And(InRange('float_field', 0.01, 0.04),
>         InRange('datetime_field', seven_days_ago, today),
>         Eq('str1', str1),
>         Eq('str2', str2),
>         Eq('str3', str3),
>         Eq('str4', str4))
> 
> It returned 15 results so it's not a large result set by any means. The 
> strings are like labels - there are <20 things any one of the string fields 
> can be.
> 
> This query took a few minutes to run the first time. Re-running it again in 
> the same session took <1 second each time. When I restarted the session it 
> took only 30 seconds, and again 1 second each subsequent time.
> 
> What makes it run so slow? Is it that the catalog isn't fully in memory? If 
> so, is there any way I can guarantee the catalog will be in memory given that 
> my entire database doesn't fit in memory all at once?
> 
> Thanks,
> - Claudiu
> _______________________________________________
> For more information about ZODB, see http://zodb.org/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev

Thanks,
Jeff Shell
j...@bottlerocket.net



_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to