> > > Okay, that makes sense. Would that be a server-side cache, or a > client-side > > cache? > > There are no server-side caches (other than the OS disk cache). >
Ok, that's what I gathered before, was just checking. > I believe I've already succeeded in getting a client-side persistent > > disk-based cache to work (my zodb_indexdb_uri is > > > "zeo://%(here)s/zeo_indexdb.sock?cache_size=2000MB&connection_cache_size=500000&connection_pool_size=5&var=zeocache&client=index"), > > This configuration syntax isn't part of ZODB. I'm not familiar with > the options there. Ah yes it's a part of repoze - http://docs.repoze.org/zodbconn/narr.html#zeo-uri-scheme . I looked into this, and the following mappings from uri-syntax to xml-syntax hold true: cache_size --> zodb/zeoclient/cache-size connection_cache_size --> zodb/cache-size connection_pool_size --> zodb/pool-size var --> zodb/zeoclient/var client --> zodb/zeoclient/client > > but this doesn't seem to be what you're referring to as that is exactly > the > > same size as the in-memory cache. > > I doubt it, but who knows? > I meant that I have only one cache-size option in terms of bytes, and the cache made on the disk is exactly that size (rather, it reserves all the space on disk instantly, even if it isn't all used). ---------------------- Here's a detailed description of the issues I'm having. I wrote the following code to preload the indices: def preload_index_btree(index_name, index_type, btree): print "((Preloading '%s' %s index btree...))" % (index_name, index_type) start = last_print = time.time() for i, item in enumerate(btree.items()): item print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % ( index_name, index_type, i, time.time() - start, ) def preload_catalog(catalog): """Given a catalog, touch every persistent object we can find to force them to go into the cache.""" start = time.time() num_indices = len(catalog.items()) for i, (index_name, index) in enumerate(catalog.items()): print "((Preloading index %2d/%2d '%s'...))" % (i+1, num_indices, index_name,) preload_index_btree(index_name, 'fwd', index._fwd_index) preload_index_btree(index_name, 'rev', index._rev_index) print "((Preloaded catalog! Took %.2fs))" % (time.time() - start) And I run it on server start as follows (modified for the relevant parts; I tried to make the example simple but it ended up needing a lot of parts). This runs in a thread: from util import zodb as Z from util import zodb_query as ZQ for i in xrange(3): connwrap = Z.ConnWrapper('index') print "((Preload #%d...))" % (i+1) with connwrap as index_root: ZQ.preload_catalog(index_root.index.catalog) connwrap.close() Z.ConnWrapper is something that uses my config to return connections such that I only have one DB instance for the whole server process: class ConnWrapper(object): def __init__(self, db_name): global_config = appconfig.get_config() db_conf = global_config['dbs'][db_name] db = db_conf['db'] self.appmaker = db_conf['appmaker'] conn = db.open() self.conn = conn self.cur_t = None #... def get_approot(self): return self.appmaker(self.conn.root()) def __enter__(self): """.begin() transaction and return the app_root""" if self.cur_t: raise ValueError("transaction already in progres") self.cur_t = self.conn.transaction_manager.begin() return self.get_approot() def __exit__(self, typ, value, tb): if typ is None: try: self.cur_t.commit() except: self.cur_t = None raise self.cur_t = None else: self.cur_t.abort() self.cur_t = None The relevant part of the global config setup is: from repoze.zodbconn.uri import db_from_uri from indexdb.models import appmaker as indexdb_appmaker #... zodb_indexdb_uri = global_config.get('zodb_indexdb_uri') index_db = db_from_uri(zodb_indexdb_uri) global_config['dbs'] = { 'index': { 'db': index_db, 'appmaker': indexdb_appmaker, }, } `zodb_indexdb_uri` is in my .ini file as mentioned above: zodb_indexdb_uri = zeo://%(here)s/zeo_indexdb.sock?cache_size=3000MB&connection_cache_size=5000000&connection_pool_size=5&var=zeocache&client=index The preloading seems to accomplish its purpose. When I restart the server, it takes a while to run through all the indices the first time over and the memory usage grows as this is happening, e.g.: ((Preloading index 3/17 'account'...)) ((Preloading 'account' fwd index btree...)) ((Preloaded 'account' fwd index btree (37 items in 0.00s))) ((Preloading 'account' rev index btree...)) ((Preloaded 'account' rev index btree (346786 items in 69.72s))) And the subsequent attempts are quite rapid: ((Preloading index 3/17 'account'...)) ((Preloading 'account' fwd index btree...)) ((Preloaded 'account' fwd index btree (37 items in 0.00s))) ((Preloading 'account' rev index btree...)) ((Preloaded 'account' rev index btree (346903 items in 0.08s))) ... ((Preloaded catalog! Took 1.58s)) #(for the entire catalog) What I don't understand is that this doesn't seem to work in the long run. Just before writing this email I ran a view that required a simple query after not having restarted the server in a while, and it took a minute or two to complete. Running the view again, it took only a few seconds. So it seems something had been moved out of the cache, which makes no sense to me as the server has plenty of RAM and the cache size is plenty large. Further, after having preloaded the indices once, shouldn't it preload quite rapidly upon further server restarts, if it's all in the cache and the cache is persisted? After the above preload ran, I restarted the server, and although the first few indices did indeed load more quickly: ((Preloading index 3/17 'account'...)) ((Preloading 'account' fwd index btree...)) ((Preloaded 'account' fwd index btree (37 items in 0.00s))) ((Preloading 'account' rev index btree...)) ((Preloaded 'account' rev index btree (346905 items in 8.37s))) Some took just as long: ((Preloaded 'timestamp' fwd index btree (348333 items in 90.69s))) And the whole catalog took a good amount yet: (Preloaded catalog! Took 199.03s)) Granted, if it took 8-11 seconds per index instead of 60-90 seconds per index, that's not such a bad improvement. This is why I was considering memcachedb, by the way - something that would work well in between restarts of my server. Another option I guess would be to have a whole other server instance running just for the caching that would be always on to avoid these issues. It seems like something isn't working with the caching, but I can't figure out why... any pointers? It seems I don't understand the cache mechanism very well as I did everything I thought should work according to my understanding. One potential thing is this: after a zeopack the index database .fs file is about 400 megabytes, so I figure a cache of 3000 megabytes should more than cover it. Before a zeopack, though - I do one every 3 hours - the file grows to 7.6 gigabytes. Shouldn't the relevant objects - the entire set of latest versions of the objects - be the ones in the cache, thus it doesn't matter that the .fs file is 7.6gb as the actual used bits of it are only 400mb or so? Another question is, does zeopacking destroy the cache? If so then that would make sense. I'll have to preload upon every zeopack. If it's not that, then I'm not sure what it could be. Thanks, - Claudiu
_______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev