On Jan 17, 2013, at 10:31 AM, Claudiu Saftoiu <csaft...@gmail.com> wrote:

> What I don't understand is that this doesn't seem to work in the long run. 
> Just before writing this email I ran a view that required a simple query 
> after not having restarted the server in a while, and it took a minute or two 
> to complete. Running the view again, it took only a few seconds. So it seems 
> something had been moved out of the cache, which makes no sense to me as the 
> server has plenty of RAM and the cache size is plenty large. 

You can use the 'cacheDetail' method of the ZODB to inspect your object 
('connection cache') and see how many objects are in there. You are using a 
connection cache size of 500000, which means 500000 ZODB objects per 
connection/thread. 'cacheDetail' will help you see how many objects are being 
used towards that count of 500,000.

I did some recent investigations where I was looking at what happened as the 
result of a catalog query used on part of the home page on a customer site that 
is exhibiting similar behaviors. The query in question is for '10 most recent 
published weblog articles'. Here's looking at the cachedetail. You can get the 
'db' object a number of ways depending on your framework. From any persistent 
object, you can get it via '._p_jar.db()'.

from pprint import pprint as pp
from operator import itemgetter
pp(sorted(db.cacheDetail(), key=itemgetter(1), reverse=True)[:20])
[('BTrees.IFBTree.IFSet', 79122),
 ('BTrees.IOBTree.IOBucket', 21516),
 ('BTrees.IFBTree.IFTreeSet', 3441),
 ('BTrees.OIBTree.OIBTree', 415),

Between IFSet and IOBucket, there's 100,000 objects alone that are going into 
our object/connection cache count (although another method, 'cacheSize()', says 
there are 83,758 items in the cache; I believe this is the non-ghost count). 
This is for just one query. 

So look at methods like cacheSize(), cacheDetailSize(), cacheDetail(), and if 
you're feeling adventurous: cacheExtremeDetail(). They will let you know how 
the object/connection cache is actually being used.

I think it's possible that with multiple, rather large BTree based catalog 
indexes that some of those IFSets and IOBuckets that make up their internals 
can still get flushed out if not exercised by a frequently used query. We've 
seen the same behavior on a couple of our biggest customers.

It's also quite possible that those big old catalog indexes have individual 
IFSets and Buckets that are getting invalidated since they change state as 
object data gets re-indexed. The invalidation causes the ZEO client cache to 
need to request a new copy, and I presume this invalidates data in the 
connection/object cache as well. Once that happens, IO is required to transfer 
the data over the network and/or disk into memory.

> Further, after having preloaded the indices once, shouldn't it preload quite 
> rapidly upon further server restarts, if it's all in the cache and the cache 
> is persisted?

Again, there are two caches here and they are not really related. The 
"persistent cache" is for ZEO to keep local copies instead of having to 
constantly hit the network. The object or 'connection' cache is what is in 
memory being used by the application. It still requires IO operations to find 
all of the bytes from the persistent ZEO cache and move them into memory as 
objects. The connection/object cache does not get preserved between restarts. 
The client/persistent cache is not a memory dump. If you run the ZODB with just 
a local FileStorage file, there is no 'persistent cache' aside from the 
database file itself.

Thanks,
Jeff Shell
j...@bottlerocket.net

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to