Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote: I wrote the following code to preload the indices: def preload_index_btree(index_name, index_type, btree): print ((Preloading '%s' %s index btree...)) % (index_name, index_type) start = last_print = time.time() for i, item in enumerate(btree.items()): item That's a no-op: you might as well just write 'pass' here. If you want to load the btree item into cache, you need to do item._p_activate() print ((Preloaded '%s' %s index btree (%d items in %.2fs))) % ( index_name, index_type, i, time.time() - start, ) If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here. Drop the enumerate() trick and just use len(btree), it's efficient. def preload_catalog(catalog): Given a catalog, touch every persistent object we can find to force them to go into the cache. start = time.time() num_indices = len(catalog.items()) for i, (index_name, index) in enumerate(catalog.items()): print ((Preloading index %2d/%2d '%s'...)) % (i+1, num_indices, index_name,) preload_index_btree(index_name, 'fwd', index._fwd_index) preload_index_btree(index_name, 'rev', index._rev_index) print ((Preloaded catalog! Took %.2fs)) % (time.time() - start) And I run it on server start as follows (modified for the relevant parts; I tried to make the example simple but it ended up needing a lot of parts). This runs in a thread: from util import zodb as Z from util import zodb_query as ZQ for i in xrange(3): connwrap = Z.ConnWrapper('index') print ((Preload #%d...)) % (i+1) with connwrap as index_root: ZQ.preload_catalog(index_root.index.catalog) connwrap.close() Every thread has its own in-memory ZODB object cache, but if you have configured a persistent ZEO client cache, it should help. Marius Gedminas -- Never trust a computer you can't repair yourself. signature.asc Description: Digital signature ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
If you want to load the btree item into cache, you need to do item._p_activate() That's not going to work, since `item` is a tuple. I don't want to load the item itself into the cache, I just want the btree to be in the cache. Er, to be clearer: my goal is for the preload to load everything into the cache that the query mechanism might use. It seems the bucket approach only takes ~10 seconds on the 350k-sized index trees vs. ~60-90 seconds. This seems to indicate that less things end up being pre-loaded... - Claudiu ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
On Fri, Jan 18, 2013 at 11:55 AM, Claudiu Saftoiu csaft...@gmail.com wrote: If you want to load the btree item into cache, you need to do item._p_activate() That's not going to work, since `item` is a tuple. I don't want to load the item itself into the cache, I just want the btree to be in the cache. Er, to be clearer: my goal is for the preload to load everything into the cache that the query mechanism might use. It seems the bucket approach only takes ~10 seconds on the 350k-sized index trees vs. ~60-90 seconds. This seems to indicate that less things end up being pre-loaded... I guess I was too subtle before. Preloading is a waste of time. Just use a persistent ZEO cache of adequate size and be done with it. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
Er, to be clearer: my goal is for the preload to load everything into the cache that the query mechanism might use. It seems the bucket approach only takes ~10 seconds on the 350k-sized index trees vs. ~60-90 seconds. This seems to indicate that less things end up being pre-loaded... I guess I was too subtle before. Preloading is a waste of time. Just use a persistent ZEO cache of adequate size and be done with it. Okay. I did that, and I only tried the preloading because it didn't seem I was getting what I wanted. To wit: I ran a simple query and it took a good few minutes. It's true, after it took a few minutes, it ran instantly, and even after a server restart it only took a few seconds, but I don't understand why it took a few minutes in the first place. There are only 750k objects in that database, and I gave it a cache object size of 5 million; the packed database .fs is only 400 megabytes, and I gave it a cache byte size of 3000 megabytes. Then when I change one parameter of the query (to ask for objects with a month of november instead of october), it takes another few minutes... Speaking to your point, preloading didn't seem to help either (I had 'preloaded' dozens of times over the past few days and the queries still took forever), but the fact remains: it does not seem unreasonable to want these queries to run instantly from the get-go, given that is the point of indexing in the first place. As it stands now, for certain queries I could probably do better loading each object and filtering it via python because I wouldn't have to deal with loading the indices in order to run the 'fast' query, but this seems to defeat the point of indices entirely, and I'd like to not have to create custom search routines for every separate query. Again, maybe I'm doing something wrong, but I haven't been able to figure it out yet. I made a view to display the output of cacheDetailSize like Jeff suggested and I got something like this: db = ... for conn_d in db.cacheDetailSize(): writer.write(%(connection)s, size=%(size)s, non-ghost-size=%(ngsize)s\n % conn_d) output: Connection at 0684fe90, size=635683, non-ghost-size=209039 Connection at 146c5ad0, size=3490, non-ghost-size=113 That is after having run the 'preloading'. It seems that when the query takes forever, the non-ghost-size is slowly increasing (~100 objects/second) while the 'size' stays the same. Once the query is done after having taken a few minutes, each subsequent run is instant and the ngsize doesn't grow. My naive question is: it has plenty of RAM, why does it not just load everything into the RAM? Any suggestions? There must be a way to effectively use indexing with zodb and what I'm doing isn't working. Thanks, - Claudiu ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
On 18 January 2013 10:21, Claudiu Saftoiu csaft...@gmail.com wrote: Er, to be clearer: my goal is for the preload to load everything into the cache that the query mechanism might use. It seems the bucket approach only takes ~10 seconds on the 350k-sized index trees vs. ~60-90 seconds. This seems to indicate that less things end up being pre-loaded... I guess I was too subtle before. Preloading is a waste of time. Just use a persistent ZEO cache of adequate size and be done with it. Okay. I did that, and I only tried the preloading because it didn't seem I was getting what I wanted. To wit: I ran a simple query and it took a good few minutes. It's true, after it took a few minutes, it ran instantly, and even after a server restart it only took a few seconds, but I don't understand why it took a few minutes in the first place. There are only 750k objects in that database, and I gave it a cache object size of 5 million; the packed database .fs is only 400 megabytes, and I gave it a cache byte size of 3000 megabytes. Then when I change one parameter of the query (to ask for objects with a month of november instead of october), it takes another few minutes... Speaking to your point, preloading didn't seem to help either (I had 'preloaded' dozens of times over the past few days and the queries still took forever), but the fact remains: it does not seem unreasonable to want these queries to run instantly from the get-go, given that is the point of indexing in the first place. As it stands now, for certain queries I could probably do better loading each object and filtering it via python because I wouldn't have to deal with loading the indices in order to run the 'fast' query, but this seems to defeat the point of indices entirely, and I'd like to not have to create custom search routines for every separate query. Again, maybe I'm doing something wrong, but I haven't been able to figure it out yet. I made a view to display the output of cacheDetailSize like Jeff suggested and I got something like this: db = ... for conn_d in db.cacheDetailSize(): writer.write(%(connection)s, size=%(size)s, non-ghost-size=%(ngsize)s\n % conn_d) output: Connection at 0684fe90, size=635683, non-ghost-size=209039 Connection at 146c5ad0, size=3490, non-ghost-size=113 That is after having run the 'preloading'. It seems that when the query takes forever, the non-ghost-size is slowly increasing (~100 objects/second) while the 'size' stays the same. Once the query is done after having taken a few minutes, each subsequent run is instant and the ngsize doesn't grow. My naive question is: it has plenty of RAM, why does it not just load everything into the RAM? Any suggestions? There must be a way to effectively use indexing with zodb and what I'm doing isn't working. Have you confirmed that the ZEO client cache file is being used? Configure logging to display the ZEO messages to make sure. The client cache is transient by default, so you will need to enable persistent client caching to see an effect past restarts: zeoclient client zeo1 ... /zeoclient https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
That is after having run the 'preloading'. It seems that when the query takes forever, the non-ghost-size is slowly increasing (~100 objects/second) while the 'size' stays the same. Once the query is done after having taken a few minutes, each subsequent run is instant and the ngsize doesn't grow. My naive question is: it has plenty of RAM, why does it not just load everything into the RAM? It's actually not *that *slow - I didn't realize that everything seems to stop while it's asking for cacheDetailSize. It seems to load about 1 objects/minute, most of these being IFTreeSet/IFSet. This seems a bit slow... if the index db has 750k objects in it, then it would take 75 minutes, at this rate, to read through it all, meaning an extensive query would really take way too long... Also my ZEO server is running locally, anyway, so the local socket transfer speed shouldn't really be much slower than loading from the persistent cache, should it? Either way it ends up loading from disk. I don't quite understand why the zeoserver doesn't have any sort of caching... hence my earlier thoughts of a memcachedb server to load all this in RAM and to just run forever. Why would it not be a win in my situation? I'm pretty new to zodb so perhaps I don't understand a lot of the design decisions very well and thus how best to take advantage of zodb, but I'm willing to learn. - Claudiu ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
Any suggestions? There must be a way to effectively use indexing with zodb and what I'm doing isn't working. Have you confirmed that the ZEO client cache file is being used? Configure logging to display the ZEO messages to make sure. The client cache is transient by default, so you will need to enable persistent client caching to see an effect past restarts: zeoclient client zeo1 ... /zeoclient https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt Yep, I specified a var of 'zeocache' and a client of 'index', and there is indeed a ./zeocache/index-1.zec file and a ./zeocache/index-1.zec.lock file. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
I wonder if disk latency is the problem?. As a test you could put the index.fs file into a tmpfs and see if that improves things, or cat index.fs /dev/null to try and force it into the fs cache. Hmm, it would seem not... the cat happens instantly: (env)tsa@sp2772c:~/sports$ time cat Data_IndexDB.fs /dev/null real0m0.065s user0m0.000s sys 0m0.064s The database isn't even very big: rw-r--r-- 1 tsa tsa 233M Jan 18 14:34 Data_IndexDB.fs Which makes me wonder why it takes so long to load it into memory it's just a bit frustrating that the server has 7gb of RAM and it's proving to be so difficult to get ZODB to keep ~300 megs of it up in there. Or, indeed, if linux already has the whole .fs file in a memory cache, where are these delays coming from? There's something I don't quite understand about this whole situation... - Claudiu ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev