We have a product written in python using ZODB/ZEO and I would like to
improve the speed of database in general. Things that I have seen
that I would like to improve some I understand and some not.
1. Loading of largish (but not too large object had a list with around
20K references in it) pickles can be very slow. Ok what size pickle?
not 100% sure is there a way to get the pickle size of a zodb
persistent object? I lamely tried to pickle one of our persistent
objects and of course it blew up with max recursion because it went
beyond the normal bounds of a zodb persistent pickle. There must be a
way to do this though right?
2. Writing lots of objects. I know that zodb wasn't written for this
type of use case but we have backed into it. We can have many (~30 is
that a lot?) zodb clients and as a result large numbers of cache
invalidations can be sent when a write occurs. Could invalidation
performance / cache refresh be an issue?
3. DB hot spots. Of course we see conflict errors when there are lots
of writes to the db from different clients that touch the same
object. We haven't done a bunch of optimization work here but I'm
thinking of moving all indexing out to a separate client/process that
reads off a queue to find objects to index. I'm guessing the indexes
are a hotspot (haven't tested this out much though I guess b-tree's
buckets should alleviate this problem some). (is there a persistent
Anyway these are some things that come to mind when I think of
performance issues. I have the thought that many could be made better
with faster ZEO I/O. Does this seem like a good assumption? If so
what could we do to make ZEO faster?
* We use a filestorage are there faster ones? Can this be a bottleneck?
* Is the ZEO protocol inefficient?
* Is the ZEO server just plain slow?
Thoughts I have that may have no impact.
* rewrite ZEO or parts of it in C
* write a C based storage
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org