Re: [ZODB-Dev] Re: ZODB Benchmarks
On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote: > Roché Compaan wrote at 2008-2-1 21:17 +0200: > >I have completed my first round of benchmarks on the ZODB and welcome > >any criticism and advise. I summarised our earlier discussion and > >additional findings in this blog entry: > >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks > > In your insertion test: when do you do commits? > One per insertion? Or one per n insertions (for which "n")? I have tried different commit intervals. The published results are for a commit interval of 100, iow 100 inserts per commit. > Your profile looks very surprising: > > I would expect that for a single insertion, typically > one persistent object (the bucket where the insertion takes place) > is changed. About every 15 inserts, 3 objects are changed (the bucket > is split) about every 15*125 inserts, 5 objects are changed > (split of bucket and its container). > But the mean value of objects changed in a transaction is 20 > in your profile. > The changed objects typically have about 65 subobjects. This > fits with "OOBucket"s. It was very surprising to me too since the insertion is so basic. I simply assign a Persistent object with 1 string attribute that is 1K in size to a key in a OOBTree. I mentioned this earlier on the list and I thought that Jim's explanation was sufficient when he said that the persistent_id method is called for all objects including simple types like strings, ints, etc. I don't know if it explains all the calls that add up to a mean value of 20 though. I guess the calls are being made by the cPickle module, but I don't have the experience to investigate this. > Lookup times: > > 0.23 s would be 230 ms not 23 ms. Oops my multiplier broke ;-) > > The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the > BTree implementation itself. Lookup time is proportional to > the tree depth, which ideally would be O(log(n)). While BTrees > are not necessarily balanced (and therefore the depth may be larger > than logarithmic) it is not easy to obtain a severely unbalanced > tree by insertions only. > Other factors must have contributed to this drop: swapping, cache too small, > garbage collections... The cache size was set to 10 objects so I doubt that this was the cause. I do the lookup test right after I populate the BTree so it might be that the cache and memory is full but I take care to commit after the BTree is populated so even this is unlikely. The keys that I lookup are completely random so it is probably the case that the lookup causes disk lookups all the time. If this is the case, is 230ms not still to slow? > Furthermore, the lookup times for your smaller BTrees are far too > good -- fetching any object from disk takes in the order of several > ms (2 to 20, depending on your disk). > This means that the lookups for your smaller BTrees have > typically been served directly from the cache (no disk lookups). > With your large BTree disk lookups probably became necessary. I accept that these lookups all all served from cache. I am going to modify the lookup test so that I close the database after population and re-open it when starting the test to make sure nothing is cached and see what the results look like. Thanks for your insightful comments! -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage now in Subversion
Dieter Maurer wrote: Unless, you begin a new transaction on your load connection after the write connection was committed, your load connection will not see the data written over your write connection. Good point. After a commit, we *must* poll. This implies, the read connection must start a new transaction at least after a "ConflictError" has occured. Otherwise, the "ConflictError" cannot go away. Also a good point. All these details will come into play if I attempt to poll less often. What I fear is described by the following szenario: You start a transaction on your load connection "L". "L" will see the world as it has been at the start of this transaction. Another transaction "M" modifies object "o". "L" reads "o", "o" is modified and committed. As "L" has used "o"'s state before "M"'s modification, the commit will try to write stale data. Hopefully, something lets the commit fail -- otherwise, we have lost a modification. Yes, RelStorage uses standard ZODB conflict detection: all object changes must be derived from the most current state of the object in the database. If any object has been changed by later transactions, conflict resolution is attempted, and if that fails, the transaction fails. I noticed another potential problem: When more than a single storage is involved, transactional consistency between these storages requires a true two phase commit. Only recently, Postgres has started support for two phase commits ("2PC") but as far as I know Python access libraries do not yet support the extended API (a few days ago, there has been a discussion on "[EMAIL PROTECTED]" about a DB-API extension for two phase commit). Unless, you use your own binding to Postgres 2PC API, "RelStorage" seems only safe for single storage use. Actually, RelStorage inherited two phase commit support from PGStorage. The 2PC API is accessible through psycopg2 if you simply issue the transaction control statements yourself. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-1 21:17 +0200: >I have completed my first round of benchmarks on the ZODB and welcome >any criticism and advise. I summarised our earlier discussion and >additional findings in this blog entry: >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks In your insertion test: when do you do commits? One per insertion? Or one per n insertions (for which "n")? Your profile looks very surprising: I would expect that for a single insertion, typically one persistent object (the bucket where the insertion takes place) is changed. About every 15 inserts, 3 objects are changed (the bucket is split) about every 15*125 inserts, 5 objects are changed (split of bucket and its container). But the mean value of objects changed in a transaction is 20 in your profile. The changed objects typically have about 65 subobjects. This fits with "OOBucket"s. Lookup times: 0.23 s would be 230 ms not 23 ms. The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the BTree implementation itself. Lookup time is proportional to the tree depth, which ideally would be O(log(n)). While BTrees are not necessarily balanced (and therefore the depth may be larger than logarithmic) it is not easy to obtain a severely unbalanced tree by insertions only. Other factors must have contributed to this drop: swapping, cache too small, garbage collections... Furthermore, the lookup times for your smaller BTrees are far too good -- fetching any object from disk takes in the order of several ms (2 to 20, depending on your disk). This means that the lookups for your smaller BTrees have typically been served directly from the cache (no disk lookups). With your large BTree disk lookups probably became necessary. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev