On 18 January 2013 10:21, Claudiu Saftoiu <csaft...@gmail.com> wrote:
>> > Er, to be clearer: my goal is for the preload to load everything into
>> > the
>> > cache that the query mechanism might use.
>> >
>> > It seems the bucket approach only takes ~10 seconds on the 350k-sized
>> > index
>> > trees vs. ~60-90 seconds. This seems to indicate that less things end up
>> > being pre-loaded...
>> I guess I was too subtle before.
>> Preloading is a waste of time.  Just use a persistent ZEO cache
>> of adequate size and be done with it.
> Okay. I did that, and I only tried the preloading because it didn't seem I
> was getting what I wanted.
> To wit: I ran a simple query and it took a good few minutes. It's true,
> after it took a few minutes, it ran instantly, and even after a server
> restart it only took a few seconds, but I don't understand why it took a few
> minutes in the first place. There are only 750k objects in that database,
> and I gave it a cache object size of 5 million; the packed database .fs is
> only 400 megabytes, and I gave it a cache byte size of 3000 megabytes.
> Then when I change one parameter of the query (to ask for objects with a
> month of november instead of october), it takes another few minutes...
> Speaking to your point, preloading didn't seem to help either (I had
> 'preloaded' dozens of times over the past few days and the queries still
> took forever), but the fact remains: it does not seem unreasonable to want
> these queries to run instantly from the get-go, given that is the point of
> indexing in the first place. As it stands now, for certain queries I could
> probably do better loading each object and filtering it via python because I
> wouldn't have to deal with loading the indices in order to run the 'fast'
> query, but this seems to defeat the point of indices entirely, and I'd like
> to not have to create custom search routines for every separate query.
> Again, maybe I'm doing something wrong, but I haven't been able to figure it
> out yet.
> I made a view to display the output of cacheDetailSize like Jeff suggested
> and I got something like this:
>     db = ...
>     for conn_d in db.cacheDetailSize():
>         writer.write("%(connection)s, size=%(size)s,
> non-ghost-size=%(ngsize)s\n" % conn_d)
> output:
>     <Connection at 0684fe90>, size=635683, non-ghost-size=209039
>     <Connection at 146c5ad0>, size=3490, non-ghost-size=113
> That is after having run the 'preloading'. It seems that when the query
> takes forever, the non-ghost-size is slowly increasing (~100 objects/second)
> while the 'size' stays the same. Once the query is done after having taken a
> few minutes, each subsequent run is instant and the ngsize doesn't grow. My
> naive question is: it has plenty of RAM, why does it not just load
> everything into the RAM?
> Any suggestions? There must be a way to effectively use indexing with zodb
> and what I'm doing isn't working.

Have you confirmed that the ZEO client cache file is being used?
Configure logging to display the ZEO messages to make sure.

The client cache is transient by default, so you will need to enable
persistent client caching to see an effect past restarts:

  client zeo1


For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to