> > Er, to be clearer: my goal is for the preload to load everything into the
> > cache that the query mechanism might use.
> >
> > It seems the bucket approach only takes ~10 seconds on the 350k-sized
> index
> > trees vs. ~60-90 seconds. This seems to indicate that less things end up
> > being pre-loaded...
> I guess I was too subtle before.
> Preloading is a waste of time.  Just use a persistent ZEO cache
> of adequate size and be done with it.

Okay. I did that, and I only tried the preloading because it didn't seem I
was getting what I wanted.

To wit: I ran a simple query and it took a good few minutes. It's true,
after it took a few minutes, it ran instantly, and even after a server
restart it only took a few seconds, but I don't understand why it took a
few minutes in the first place. There are only 750k objects in that
database, and I gave it a cache object size of 5 million; the packed
database .fs is only 400 megabytes, and I gave it a cache byte size of 3000

Then when I change one parameter of the query (to ask for objects with a
month of november instead of october), it takes another few minutes...

Speaking to your point, preloading didn't seem to help either (I had
'preloaded' dozens of times over the past few days and the queries still
took forever), but the fact remains: it does not seem unreasonable to want
these queries to run instantly from the get-go, given that is the point of
indexing in the first place. As it stands now, for certain queries I could
probably do better loading each object and filtering it via python because
I wouldn't have to deal with loading the indices in order to run the 'fast'
query, but this seems to defeat the point of indices entirely, and I'd like
to not have to create custom search routines for every separate query.
Again, maybe I'm doing something wrong, but I haven't been able to figure
it out yet.

I made a view to display the output of cacheDetailSize like Jeff suggested
and I got something like this:

    db = ...
    for conn_d in db.cacheDetailSize():
        writer.write("%(connection)s, size=%(size)s,
non-ghost-size=%(ngsize)s\n" % conn_d)


    <Connection at 0684fe90>, size=635683, non-ghost-size=209039
    <Connection at 146c5ad0>, size=3490, non-ghost-size=113

That is after having run the 'preloading'. It seems that when the query
takes forever, the non-ghost-size is slowly increasing (~100
objects/second) while the 'size' stays the same. Once the query is done
after having taken a few minutes, each subsequent run is instant and the
ngsize doesn't grow. My naive question is: it has plenty of RAM, why does
it not just load everything into the RAM?

Any suggestions? There must be a way to effectively use indexing with zodb
and what I'm doing isn't working.

- Claudiu
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to