On Jan 17, 2013, at 10:31 AM, Claudiu Saftoiu <csaft...@gmail.com> wrote:
> What I don't understand is that this doesn't seem to work in the long run.
> Just before writing this email I ran a view that required a simple query
> after not having restarted the server in a while, and it took a minute or two
> to complete. Running the view again, it took only a few seconds. So it seems
> something had been moved out of the cache, which makes no sense to me as the
> server has plenty of RAM and the cache size is plenty large.
You can use the 'cacheDetail' method of the ZODB to inspect your object
('connection cache') and see how many objects are in there. You are using a
connection cache size of 500000, which means 500000 ZODB objects per
connection/thread. 'cacheDetail' will help you see how many objects are being
used towards that count of 500,000.
I did some recent investigations where I was looking at what happened as the
result of a catalog query used on part of the home page on a customer site that
is exhibiting similar behaviors. The query in question is for '10 most recent
published weblog articles'. Here's looking at the cachedetail. You can get the
'db' object a number of ways depending on your framework. From any persistent
object, you can get it via '._p_jar.db()'.
from pprint import pprint as pp
from operator import itemgetter
pp(sorted(db.cacheDetail(), key=itemgetter(1), reverse=True)[:20])
Between IFSet and IOBucket, there's 100,000 objects alone that are going into
our object/connection cache count (although another method, 'cacheSize()', says
there are 83,758 items in the cache; I believe this is the non-ghost count).
This is for just one query.
So look at methods like cacheSize(), cacheDetailSize(), cacheDetail(), and if
you're feeling adventurous: cacheExtremeDetail(). They will let you know how
the object/connection cache is actually being used.
I think it's possible that with multiple, rather large BTree based catalog
indexes that some of those IFSets and IOBuckets that make up their internals
can still get flushed out if not exercised by a frequently used query. We've
seen the same behavior on a couple of our biggest customers.
It's also quite possible that those big old catalog indexes have individual
IFSets and Buckets that are getting invalidated since they change state as
object data gets re-indexed. The invalidation causes the ZEO client cache to
need to request a new copy, and I presume this invalidates data in the
connection/object cache as well. Once that happens, IO is required to transfer
the data over the network and/or disk into memory.
> Further, after having preloaded the indices once, shouldn't it preload quite
> rapidly upon further server restarts, if it's all in the cache and the cache
> is persisted?
Again, there are two caches here and they are not really related. The
"persistent cache" is for ZEO to keep local copies instead of having to
constantly hit the network. The object or 'connection' cache is what is in
memory being used by the application. It still requires IO operations to find
all of the bytes from the persistent ZEO cache and move them into memory as
objects. The connection/object cache does not get preserved between restarts.
The client/persistent cache is not a memory dump. If you run the ZODB with just
a local FileStorage file, there is no 'persistent cache' aside from the
database file itself.
For more information about ZODB, see http://zodb.org/
ZODB-Dev mailing list - ZODB-Dev@zope.org