On Fri, Jan 18, 2013 at 9:02 AM, Marius Gedminas <mar...@gedmin.as> wrote:
> On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote:
> > I wrote the following code to preload the indices:
> > def preload_index_btree(index_name, index_type, btree):
> > print "((Preloading '%s' %s index btree...))" % (index_name,
> > index_type)
> > start = last_print = time.time()
> > for i, item in enumerate(btree.items()):
> > item
> That's a no-op: you might as well just write 'pass' here.
True, I wanted to do something with 'item' but didn't know what.
> > print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % (
> > index_name, index_type, i, time.time() - start,
> > )
> If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here.
> Drop the enumerate() trick and just use len(btree), it's efficient.
Thanks for catching that. `len` still takes a while on a large btree though
if it isn't in memory:
In : start = time.time(); len(bt); end = time.time()
In : end - start
It actually seems to require loading the entire tree, because after running
`len`, subsequent operations (like iterating through the entire tree) start
happening instantly. However, since I just iterated through the entire
tree, it will definitely be fast at that point.
> If you want to load the btree item into cache, you need to do
That's not going to work, since `item` is a tuple. I don't want to load the
item itself into the cache, I just want the btree to be in the cache. I
figured iterating through the entire tree would force it to be loaded, but
is that not the case? If not then what should I call `_p_activate()` on? I
assume calling it on the tree itself won't cause all its internals to be
loaded. I'm not familiar with the internals of the BTree, however. Would
this be a better solution?
def preload_index_btree(index_name, index_type, btree):
print "((Preloading '%s' %s index btree...))" % (index_name,
start = time.time()
num_buckets = 0
bucket = btree._firstbucket
num_buckets += 1
bucket = bucket._next
print "((Preloaded '%s' %s index btree (%d/%d buckets items in
%.2fs)))" % (
index_name, index_type, len(btree), num_buckets, time.time() -
> def preload_catalog(catalog):
> > """Given a catalog, touch every persistent object we can find to
> > force
> > them to go into the cache."""
> > start = time.time()
> > num_indices = len(catalog.items())
> > for i, (index_name, index) in enumerate(catalog.items()):
> > print "((Preloading index %2d/%2d '%s'...))" % (i+1,
> > num_indices, index_name,)
> > preload_index_btree(index_name, 'fwd', index._fwd_index)
> > preload_index_btree(index_name, 'rev', index._rev_index)
> > print "((Preloaded catalog! Took %.2fs))" % (time.time() - start)
> > And I run it on server start as follows (modified for the relevant
> parts; I
> > tried to make the example simple but it ended up needing a lot of parts).
> > This runs in a thread:
> > from util import zodb as Z
> > from util import zodb_query as ZQ
> > for i in xrange(3):
> > connwrap = Z.ConnWrapper('index')
> > print "((Preload #%d...))" % (i+1)
> > with connwrap as index_root:
> > ZQ.preload_catalog(index_root.index.catalog)
> > connwrap.close()
> Every thread has its own in-memory ZODB object cache, but if you have
> configured a persistent ZEO client cache, it should help.
Gotcha. Thanks for the help!
For more information about ZODB, see http://zodb.org/
ZODB-Dev mailing list - ZODB-Dev@zope.org