Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Marius Gedminas
On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote:
 I wrote the following code to preload the indices:
 
 def preload_index_btree(index_name, index_type, btree):
 print ((Preloading '%s' %s index btree...)) % (index_name,
 index_type)
 start = last_print = time.time()
 for i, item in enumerate(btree.items()):
 item

That's a no-op: you might as well just write 'pass' here.

If you want to load the btree item into cache, you need to do

  item._p_activate()

 print ((Preloaded '%s' %s index btree (%d items in %.2fs))) % (
 index_name, index_type, i, time.time() - start,
 )

If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here.

Drop the enumerate() trick and just use len(btree), it's efficient.

 def preload_catalog(catalog):
 Given a catalog, touch every persistent object we can find to
 force
 them to go into the cache.
 start = time.time()
 num_indices = len(catalog.items())
 for i, (index_name, index) in enumerate(catalog.items()):
 print ((Preloading index %2d/%2d '%s'...)) % (i+1,
 num_indices, index_name,)
 preload_index_btree(index_name, 'fwd', index._fwd_index)
 preload_index_btree(index_name, 'rev', index._rev_index)
 print ((Preloaded catalog! Took %.2fs)) % (time.time() - start)
 
 And I run it on server start as follows (modified for the relevant parts; I
 tried to make the example simple but it ended up needing a lot of parts).
 This runs in a thread:
 
 from util import zodb as Z
 from util import zodb_query as ZQ
 for i in xrange(3):
 connwrap = Z.ConnWrapper('index')
 print ((Preload #%d...)) % (i+1)
 with connwrap as index_root:
 ZQ.preload_catalog(index_root.index.catalog)
 connwrap.close()

Every thread has its own in-memory ZODB object cache, but if you have
configured a persistent ZEO client cache, it should help.

Marius Gedminas
-- 
Never trust a computer you can't repair yourself.


signature.asc
Description: Digital signature
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu


 If you want to load the btree item into cache, you need to do

   item._p_activate()


 That's not going to work, since `item` is a tuple. I don't want to load
 the item itself into the cache, I just want the btree to be in the cache.


Er, to be clearer: my goal is for the preload to load everything into the
cache that the query mechanism might use.

It seems the bucket approach only takes ~10 seconds on the 350k-sized index
trees vs. ~60-90 seconds. This seems to indicate that less things end up
being pre-loaded...

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Jim Fulton
On Fri, Jan 18, 2013 at 11:55 AM, Claudiu Saftoiu csaft...@gmail.com wrote:



 If you want to load the btree item into cache, you need to do

   item._p_activate()


 That's not going to work, since `item` is a tuple. I don't want to load
 the item itself into the cache, I just want the btree to be in the cache.


 Er, to be clearer: my goal is for the preload to load everything into the
 cache that the query mechanism might use.

 It seems the bucket approach only takes ~10 seconds on the 350k-sized index
 trees vs. ~60-90 seconds. This seems to indicate that less things end up
 being pre-loaded...

I guess I was too subtle before.

Preloading is a waste of time.  Just use a persistent ZEO cache
of adequate size and be done with it.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
Jerky is better than bacon! http://zo.pe/Kqm
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
  Er, to be clearer: my goal is for the preload to load everything into the
  cache that the query mechanism might use.
 
  It seems the bucket approach only takes ~10 seconds on the 350k-sized
 index
  trees vs. ~60-90 seconds. This seems to indicate that less things end up
  being pre-loaded...

 I guess I was too subtle before.

 Preloading is a waste of time.  Just use a persistent ZEO cache
 of adequate size and be done with it.


Okay. I did that, and I only tried the preloading because it didn't seem I
was getting what I wanted.

To wit: I ran a simple query and it took a good few minutes. It's true,
after it took a few minutes, it ran instantly, and even after a server
restart it only took a few seconds, but I don't understand why it took a
few minutes in the first place. There are only 750k objects in that
database, and I gave it a cache object size of 5 million; the packed
database .fs is only 400 megabytes, and I gave it a cache byte size of 3000
megabytes.

Then when I change one parameter of the query (to ask for objects with a
month of november instead of october), it takes another few minutes...

Speaking to your point, preloading didn't seem to help either (I had
'preloaded' dozens of times over the past few days and the queries still
took forever), but the fact remains: it does not seem unreasonable to want
these queries to run instantly from the get-go, given that is the point of
indexing in the first place. As it stands now, for certain queries I could
probably do better loading each object and filtering it via python because
I wouldn't have to deal with loading the indices in order to run the 'fast'
query, but this seems to defeat the point of indices entirely, and I'd like
to not have to create custom search routines for every separate query.
Again, maybe I'm doing something wrong, but I haven't been able to figure
it out yet.

I made a view to display the output of cacheDetailSize like Jeff suggested
and I got something like this:

db = ...
for conn_d in db.cacheDetailSize():
writer.write(%(connection)s, size=%(size)s,
non-ghost-size=%(ngsize)s\n % conn_d)

output:

Connection at 0684fe90, size=635683, non-ghost-size=209039
Connection at 146c5ad0, size=3490, non-ghost-size=113

That is after having run the 'preloading'. It seems that when the query
takes forever, the non-ghost-size is slowly increasing (~100
objects/second) while the 'size' stays the same. Once the query is done
after having taken a few minutes, each subsequent run is instant and the
ngsize doesn't grow. My naive question is: it has plenty of RAM, why does
it not just load everything into the RAM?

Any suggestions? There must be a way to effectively use indexing with zodb
and what I'm doing isn't working.

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Laurence Rowe
On 18 January 2013 10:21, Claudiu Saftoiu csaft...@gmail.com wrote:

  Er, to be clearer: my goal is for the preload to load everything into
  the
  cache that the query mechanism might use.
 
  It seems the bucket approach only takes ~10 seconds on the 350k-sized
  index
  trees vs. ~60-90 seconds. This seems to indicate that less things end up
  being pre-loaded...

 I guess I was too subtle before.

 Preloading is a waste of time.  Just use a persistent ZEO cache
 of adequate size and be done with it.


 Okay. I did that, and I only tried the preloading because it didn't seem I
 was getting what I wanted.

 To wit: I ran a simple query and it took a good few minutes. It's true,
 after it took a few minutes, it ran instantly, and even after a server
 restart it only took a few seconds, but I don't understand why it took a few
 minutes in the first place. There are only 750k objects in that database,
 and I gave it a cache object size of 5 million; the packed database .fs is
 only 400 megabytes, and I gave it a cache byte size of 3000 megabytes.

 Then when I change one parameter of the query (to ask for objects with a
 month of november instead of october), it takes another few minutes...

 Speaking to your point, preloading didn't seem to help either (I had
 'preloaded' dozens of times over the past few days and the queries still
 took forever), but the fact remains: it does not seem unreasonable to want
 these queries to run instantly from the get-go, given that is the point of
 indexing in the first place. As it stands now, for certain queries I could
 probably do better loading each object and filtering it via python because I
 wouldn't have to deal with loading the indices in order to run the 'fast'
 query, but this seems to defeat the point of indices entirely, and I'd like
 to not have to create custom search routines for every separate query.
 Again, maybe I'm doing something wrong, but I haven't been able to figure it
 out yet.

 I made a view to display the output of cacheDetailSize like Jeff suggested
 and I got something like this:

 db = ...
 for conn_d in db.cacheDetailSize():
 writer.write(%(connection)s, size=%(size)s,
 non-ghost-size=%(ngsize)s\n % conn_d)

 output:

 Connection at 0684fe90, size=635683, non-ghost-size=209039
 Connection at 146c5ad0, size=3490, non-ghost-size=113

 That is after having run the 'preloading'. It seems that when the query
 takes forever, the non-ghost-size is slowly increasing (~100 objects/second)
 while the 'size' stays the same. Once the query is done after having taken a
 few minutes, each subsequent run is instant and the ngsize doesn't grow. My
 naive question is: it has plenty of RAM, why does it not just load
 everything into the RAM?

 Any suggestions? There must be a way to effectively use indexing with zodb
 and what I'm doing isn't working.

Have you confirmed that the ZEO client cache file is being used?
Configure logging to display the ZEO messages to make sure.

The client cache is transient by default, so you will need to enable
persistent client caching to see an effect past restarts:

zeoclient
  client zeo1
  ...
/zeoclient

https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu

 That is after having run the 'preloading'. It seems that when the query
 takes forever, the non-ghost-size is slowly increasing (~100
 objects/second) while the 'size' stays the same. Once the query is done
 after having taken a few minutes, each subsequent run is instant and the
 ngsize doesn't grow. My naive question is: it has plenty of RAM, why does
 it not just load everything into the RAM?


It's actually not *that *slow - I didn't realize that everything seems to
stop while it's asking for cacheDetailSize. It seems to load about 1
objects/minute, most of these being IFTreeSet/IFSet. This seems a bit
slow... if the index db has 750k objects in it, then it would take 75
minutes, at this rate, to read through it all, meaning an extensive query
would really take way too long...

Also my ZEO server is running locally, anyway, so the local socket transfer
speed shouldn't really be much slower than loading from the persistent
cache, should it? Either way it ends up loading from disk.

I don't quite understand why the zeoserver doesn't have any sort of
caching... hence my earlier thoughts of a memcachedb server to load all
this in RAM and to just run forever. Why would it not be a win in my
situation?

I'm pretty new to zodb so perhaps I don't understand a lot of the design
decisions very well and thus how best to take advantage of zodb, but I'm
willing to learn.

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
  Any suggestions? There must be a way to effectively use indexing with
 zodb
  and what I'm doing isn't working.

 Have you confirmed that the ZEO client cache file is being used?
 Configure logging to display the ZEO messages to make sure.

 The client cache is transient by default, so you will need to enable
 persistent client caching to see an effect past restarts:

 zeoclient
   client zeo1
   ...
 /zeoclient

 https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt


Yep, I specified a var of 'zeocache' and a client of 'index', and there is
indeed a ./zeocache/index-1.zec file and a ./zeocache/index-1.zec.lock
file.



 Laurence

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
 I wonder if disk latency is the problem?. As a test you could put the
 index.fs file into a tmpfs and see if that improves things, or cat
 index.fs  /dev/null to try and force it into the fs cache.


Hmm, it would seem not... the cat happens instantly:

(env)tsa@sp2772c:~/sports$ time cat Data_IndexDB.fs  /dev/null

real0m0.065s
user0m0.000s
sys 0m0.064s

The database isn't even very big:

rw-r--r-- 1 tsa tsa 233M Jan 18 14:34 Data_IndexDB.fs

Which makes me wonder why it takes so long to load it into memory it's
just a bit frustrating that the server has 7gb of RAM and it's proving to
be so difficult to get ZODB to keep ~300 megs of it up in there. Or,
indeed, if linux already has the whole .fs file in a memory cache, where
are these delays coming from? There's something I don't quite understand
about this whole situation...

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev