Y'all might have better luck with this on zope-dev. Jim
On Thu, Jan 3, 2013 at 5:25 PM, Jeff Shell <j...@bottlerocket.net> wrote: > Dear gods, I hope you get an answer to this question, as I've noticed the > same thing with very large indexes (using zc.catalog). I believe that at the > root layers repoze.catalog is built around the same concepts and structures. > > When I tried to trace down the problems with a profiler, they all revolved > around loading the relevant portions of the indexes into memory. It had > nothing to do with the final results of the query; had nothing to do with > waking up the 'result' objects; all of the slowness seemed to be in loading > the indexes themselves into memory. In one case, we were only using one index > (a SetIndex), with about 150000 document ids. > > This is from my own profiling. All I know is that this is very slow, and then > very fast, and the object cache and the relevant indexes ability to keep all > of their little BTrees or Buckets or Sets or whatever in that object cache > seem to have a tremendous impact on query and response time - far more than > is taken up by then waking up the content objects in your result set. > > When the indexes aren't in memory, in my case I found the slowness to be in > BTrees's 'multiunion' function; the real slowness was in calling ZODB's > setstate (which is loading into memory). This is just BTree (catalog index) > data being loaded at this point: > > > Profiling a fresh site (no object cache / memory population yet) > ================================================================ > winterhome-firstload.profile% callees multiunion > Function called... > ncalls tottime cumtime > {BTrees._IFBTree.multiunion} -> 65980 0.132 57.891 > Connection.py:848(setstate) > > winterhome-firstload.profile% callers multiunion > Function was called by... > ncalls tottime cumtime > {BTrees._IFBTree.multiunion} <- 27 0.348 58.239 > index.py:203(apply) > > (yep, 58 seconds; very slow ZEO network load in a demostorage setup where ZEO > cannot update its client cache, which makes these setstate problems very > exaggerated). 'multiunion' is called 27 times, but one of those times takes > 58 seconds). > > > Profiling the same page again with everything all loaded > ======================================================== > winterhome-secondload.profile% callees multiunion > Function called... > ncalls tottime cumtime > {BTrees._IFBTree.multiunion} -> > > winterhome-secondload.profile% callers multiunion > Function was called by... > ncalls tottime cumtime > {BTrees._IFBTree.multiunion} <- 27 0.193 0.193 > index.py:203(apply) > > (this time, multiunion doesn't require any ZODB loads, and its 27 calls > internal time and cumulative time are relatively speedy) > > If there's a good strategy for getting and keeping these things in memory, > I'd love to know it; but when the catalog indexes are competing with all of > the content objects that make up a site, it's hard to know what to do or even > how to configure the object cache counts well without running into serious > memory problems. > > On Jan 3, 2013, at 2:50 PM, Claudiu Saftoiu <csaft...@gmail.com> wrote: > >> Hello all, >> >> Am I doing something wrong with my queries, or is repoze.catalog.query very >> slow? >> >> I have a `Catalog` with ~320,000 objects and 17 `CatalogFieldIndex`es. All >> the objects are indexed and up to date. This is the query I ran (field names >> renamed): >> >> And(InRange('float_field', 0.01, 0.04), >> InRange('datetime_field', seven_days_ago, today), >> Eq('str1', str1), >> Eq('str2', str2), >> Eq('str3', str3), >> Eq('str4', str4)) >> >> It returned 15 results so it's not a large result set by any means. The >> strings are like labels - there are <20 things any one of the string fields >> can be. >> >> This query took a few minutes to run the first time. Re-running it again in >> the same session took <1 second each time. When I restarted the session it >> took only 30 seconds, and again 1 second each subsequent time. >> >> What makes it run so slow? Is it that the catalog isn't fully in memory? If >> so, is there any way I can guarantee the catalog will be in memory given >> that my entire database doesn't fit in memory all at once? >> >> Thanks, >> - Claudiu >> _______________________________________________ >> For more information about ZODB, see http://zodb.org/ >> >> ZODB-Dev mailing list - ZODB-Dev@zope.org >> https://mail.zope.org/mailman/listinfo/zodb-dev > > Thanks, > Jeff Shell > j...@bottlerocket.net > > > > _______________________________________________ > For more information about ZODB, see http://zodb.org/ > > ZODB-Dev mailing list - ZODB-Dev@zope.org > https://mail.zope.org/mailman/listinfo/zodb-dev -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev