Y'all might have better luck with this on zope-dev.

Jim

On Thu, Jan 3, 2013 at 5:25 PM, Jeff Shell <j...@bottlerocket.net> wrote:
> Dear gods, I hope you get an answer to this question, as I've noticed the 
> same thing with very large indexes (using zc.catalog). I believe that at the 
> root layers repoze.catalog is built around the same concepts and structures.
>
> When I tried to trace down the problems with a profiler, they all revolved 
> around loading the relevant portions of the indexes into memory. It had 
> nothing to do with the final results of the query; had nothing to do with 
> waking up the 'result' objects; all of the slowness seemed to be in loading 
> the indexes themselves into memory. In one case, we were only using one index 
> (a SetIndex), with about 150000 document ids.
>
> This is from my own profiling. All I know is that this is very slow, and then 
> very fast, and the object cache and the relevant indexes ability to keep all 
> of their little BTrees or Buckets or Sets or whatever in that object cache 
> seem to have a tremendous impact on query and response time - far more than 
> is taken up by then waking up the content objects in your result set.
>
> When the indexes aren't in memory, in my case I found the slowness to be in 
> BTrees's 'multiunion' function; the real slowness was in calling ZODB's 
> setstate (which is loading into memory). This is just BTree (catalog index) 
> data being loaded at this point:
>
>
> Profiling a fresh site (no object cache / memory population yet)
> ================================================================
> winterhome-firstload.profile% callees multiunion
> Function                      called...
>                                   ncalls  tottime  cumtime
> {BTrees._IFBTree.multiunion}  ->   65980    0.132   57.891  
> Connection.py:848(setstate)
>
> winterhome-firstload.profile% callers multiunion
> Function                      was called by...
>                                   ncalls  tottime  cumtime
> {BTrees._IFBTree.multiunion}  <-      27    0.348   58.239  
> index.py:203(apply)
>
> (yep, 58 seconds; very slow ZEO network load in a demostorage setup where ZEO 
> cannot update its client cache, which makes these setstate problems very 
> exaggerated). 'multiunion' is called 27 times, but one of those times takes 
> 58 seconds).
>
>
> Profiling the same page again with everything all loaded
> ========================================================
> winterhome-secondload.profile% callees multiunion
> Function                      called...
>                                   ncalls  tottime  cumtime
> {BTrees._IFBTree.multiunion}  ->
>
> winterhome-secondload.profile% callers multiunion
> Function                      was called by...
>                                   ncalls  tottime  cumtime
> {BTrees._IFBTree.multiunion}  <-      27    0.193    0.193  
> index.py:203(apply)
>
> (this time, multiunion doesn't require any ZODB loads, and its 27 calls 
> internal time and cumulative time are relatively speedy)
>
> If there's a good strategy for getting and keeping these things in memory, 
> I'd love to know it; but when the catalog indexes are competing with all of 
> the content objects that make up a site, it's hard to know what to do or even 
> how to configure the object cache counts well without running into serious 
> memory problems.
>
> On Jan 3, 2013, at 2:50 PM, Claudiu Saftoiu <csaft...@gmail.com> wrote:
>
>> Hello all,
>>
>> Am I doing something wrong with my queries, or is repoze.catalog.query very 
>> slow?
>>
>> I have a `Catalog` with ~320,000 objects and 17 `CatalogFieldIndex`es. All 
>> the objects are indexed and up to date. This is the query I ran (field names 
>> renamed):
>>
>>     And(InRange('float_field', 0.01, 0.04),
>>         InRange('datetime_field', seven_days_ago, today),
>>         Eq('str1', str1),
>>         Eq('str2', str2),
>>         Eq('str3', str3),
>>         Eq('str4', str4))
>>
>> It returned 15 results so it's not a large result set by any means. The 
>> strings are like labels - there are <20 things any one of the string fields 
>> can be.
>>
>> This query took a few minutes to run the first time. Re-running it again in 
>> the same session took <1 second each time. When I restarted the session it 
>> took only 30 seconds, and again 1 second each subsequent time.
>>
>> What makes it run so slow? Is it that the catalog isn't fully in memory? If 
>> so, is there any way I can guarantee the catalog will be in memory given 
>> that my entire database doesn't fit in memory all at once?
>>
>> Thanks,
>> - Claudiu
>> _______________________________________________
>> For more information about ZODB, see http://zodb.org/
>>
>> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
>> https://mail.zope.org/mailman/listinfo/zodb-dev
>
> Thanks,
> Jeff Shell
> j...@bottlerocket.net
>
>
>
> _______________________________________________
> For more information about ZODB, see http://zodb.org/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev



-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
Jerky is better than bacon! http://zo.pe/Kqm
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to