On Fri, Jan 18, 2013 at 9:02 AM, Marius Gedminas <mar...@gedmin.as> wrote:
> On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote: > > I wrote the following code to preload the indices: > > > > def preload_index_btree(index_name, index_type, btree): > > print "((Preloading '%s' %s index btree...))" % (index_name, > > index_type) > > start = last_print = time.time() > > for i, item in enumerate(btree.items()): > > item > > That's a no-op: you might as well just write 'pass' here. > True, I wanted to do something with 'item' but didn't know what. > > print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % ( > > index_name, index_type, i, time.time() - start, > > ) > > If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here. > > Drop the enumerate() trick and just use len(btree), it's efficient. > Thanks for catching that. `len` still takes a while on a large btree though if it isn't in memory: In [7]: start = time.time(); len(bt); end = time.time() Out[7]: 350169 In [8]: end - start Out[8]: 32.397267818450928 It actually seems to require loading the entire tree, because after running `len`, subsequent operations (like iterating through the entire tree) start happening instantly. However, since I just iterated through the entire tree, it will definitely be fast at that point. > If you want to load the btree item into cache, you need to do > > item._p_activate() > That's not going to work, since `item` is a tuple. I don't want to load the item itself into the cache, I just want the btree to be in the cache. I figured iterating through the entire tree would force it to be loaded, but is that not the case? If not then what should I call `_p_activate()` on? I assume calling it on the tree itself won't cause all its internals to be loaded. I'm not familiar with the internals of the BTree, however. Would this be a better solution? def preload_index_btree(index_name, index_type, btree): print "((Preloading '%s' %s index btree...))" % (index_name, index_type) start = time.time() num_buckets = 0 bucket = btree._firstbucket while bucket: bucket._p_activate() num_buckets += 1 bucket = bucket._next print "((Preloaded '%s' %s index btree (%d/%d buckets items in %.2fs)))" % ( index_name, index_type, len(btree), num_buckets, time.time() - start, ) > def preload_catalog(catalog): > > """Given a catalog, touch every persistent object we can find to > > force > > them to go into the cache.""" > > start = time.time() > > num_indices = len(catalog.items()) > > for i, (index_name, index) in enumerate(catalog.items()): > > print "((Preloading index %2d/%2d '%s'...))" % (i+1, > > num_indices, index_name,) > > preload_index_btree(index_name, 'fwd', index._fwd_index) > > preload_index_btree(index_name, 'rev', index._rev_index) > > print "((Preloaded catalog! Took %.2fs))" % (time.time() - start) > > > > And I run it on server start as follows (modified for the relevant > parts; I > > tried to make the example simple but it ended up needing a lot of parts). > > This runs in a thread: > > > > from util import zodb as Z > > from util import zodb_query as ZQ > > for i in xrange(3): > > connwrap = Z.ConnWrapper('index') > > print "((Preload #%d...))" % (i+1) > > with connwrap as index_root: > > ZQ.preload_catalog(index_root.index.catalog) > > connwrap.close() > > Every thread has its own in-memory ZODB object cache, but if you have > configured a persistent ZEO client cache, it should help. > Gotcha. Thanks for the help! - Claudiu
_______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev