On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-1 21:17 +0200:
> >I have completed my first round of benchmarks on the ZODB and welcome
> >any criticism and advise. I summarised our earlier discussion and
> >additional findings in this blog entry:
> In your insertion test: when do you do commits?
> One per insertion? Or one per n insertions (for which "n")?
I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.
> Your profile looks very surprising:
> I would expect that for a single insertion, typically
> one persistent object (the bucket where the insertion takes place)
> is changed. About every 15 inserts, 3 objects are changed (the bucket
> is split) about every 15*125 inserts, 5 objects are changed
> (split of bucket and its container).
> But the mean value of objects changed in a transaction is 20
> in your profile.
> The changed objects typically have about 65 subobjects. This
> fits with "OOBucket"s.
It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.
> Lookup times:
> 0.23 s would be 230 ms not 23 ms.
Oops my multiplier broke ;-)
> The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
> BTree implementation itself. Lookup time is proportional to
> the tree depth, which ideally would be O(log(n)). While BTrees
> are not necessarily balanced (and therefore the depth may be larger
> than logarithmic) it is not easy to obtain a severely unbalanced
> tree by insertions only.
> Other factors must have contributed to this drop: swapping, cache too small,
> garbage collections...
The cache size was set to 100000 objects so I doubt that this was the
cause. I do the lookup test right after I populate the BTree so it might
be that the cache and memory is full but I take care to commit after the
BTree is populated so even this is unlikely.
The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?
> Furthermore, the lookup times for your smaller BTrees are far too
> good -- fetching any object from disk takes in the order of several
> ms (2 to 20, depending on your disk).
> This means that the lookups for your smaller BTrees have
> typically been served directly from the cache (no disk lookups).
> With your large BTree disk lookups probably became necessary.
I accept that these lookups all all served from cache. I am going to
modify the lookup test so that I close the database after population and
re-open it when starting the test to make sure nothing is cached and see
what the results look like.
Thanks for your insightful comments!
Upfront Systems http://www.upfrontsystems.co.za
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org