Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Roché Compaan
On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-1 21:17 +0200:
> >I have completed my first round of benchmarks on the ZODB and welcome
> >any criticism and advise. I summarised our earlier discussion and
> >additional findings in this blog entry:
> >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks
> 
> In your insertion test: when do you do commits?
> One per insertion? Or one per n insertions (for which "n")?

I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.

> Your profile looks very surprising:
> 
>   I would expect that for a single insertion, typically
>   one persistent object (the bucket where the insertion takes place)
>   is changed. About every 15 inserts, 3 objects are changed (the bucket
>   is split) about every 15*125 inserts, 5 objects are changed
>   (split of bucket and its container).
>   But the mean value of objects changed in a transaction is 20
>   in your profile.
>   The changed objects typically have about 65 subobjects. This
>   fits with "OOBucket"s.

It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.

> Lookup times:
> 
> 0.23 s would be 230 ms not 23 ms.

Oops my multiplier broke ;-)

> 
> The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
> BTree implementation itself. Lookup time is proportional to
> the tree depth, which ideally would be O(log(n)). While BTrees
> are not necessarily balanced (and therefore the depth may be larger
> than logarithmic) it is not easy to obtain a severely unbalanced
> tree by insertions only.
> Other factors must have contributed to this drop: swapping, cache too small,
> garbage collections...

The cache size was set to 10 objects so I doubt that this was the
cause. I do the lookup test right after I populate the BTree so it might
be that the cache and memory is full but I take care to commit after the
BTree is populated so even this is unlikely.

The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?

> Furthermore, the lookup times for your smaller BTrees are far too
> good -- fetching any object from disk takes in the order of several
> ms (2 to 20, depending on your disk).
> This means that the lookups for your smaller BTrees have
> typically been served directly from the cache (no disk lookups).
> With your large BTree disk lookups probably became necessary.

I accept that these lookups all all served from cache. I am going to
modify the lookup test so that I close the database after population and
re-open it when starting the test to make sure nothing is cached and see
what the results look like.

Thanks for your insightful comments!

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage now in Subversion

2008-02-02 Thread Shane Hathaway

Dieter Maurer wrote:

Unless, you begin a new transaction on your load connection after
the write connection was committed,
your load connection will not see the data written over
your write connection.


Good point.  After a commit, we *must* poll.


This implies, the read connection must start a new transaction
at least after a "ConflictError" has occured. Otherwise, the
"ConflictError" cannot go away.


Also a good point.  All these details will come into play if I attempt 
to poll less often.



What I fear is described by the following szenario:

   You start a transaction on your load connection "L".
   "L" will see the world as it has been at the start of this transaction.

   Another transaction "M" modifies object "o".

   "L" reads "o", "o" is modified and committed.
   As "L" has used "o"'s state before "M"'s modification,
   the commit will try to write stale data.
   Hopefully, something lets the commit fail -- otherwise,
   we have lost a modification.


Yes, RelStorage uses standard ZODB conflict detection: all object 
changes must be derived from the most current state of the object in the 
database.  If any object has been changed by later transactions, 
conflict resolution is attempted, and if that fails, the transaction fails.



I noticed another potential problem:

  When more than a single storage is involved, transactional
  consistency between these storages requires a true two phase
  commit.

  Only recently, Postgres has started support for two phase commits ("2PC") but
  as far as I know Python access libraries do not yet support the
  extended API (a few days ago, there has been a discussion on
  "[EMAIL PROTECTED]" about a DB-API extension for two phase commit).

  Unless, you use your own binding to Postgres 2PC API, "RelStorage"
  seems only safe for single storage use.


Actually, RelStorage inherited two phase commit support from PGStorage. 
 The 2PC API is accessible through psycopg2 if you simply issue the 
transaction control statements yourself.


Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-1 21:17 +0200:
>I have completed my first round of benchmarks on the ZODB and welcome
>any criticism and advise. I summarised our earlier discussion and
>additional findings in this blog entry:
>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks

In your insertion test: when do you do commits?
One per insertion? Or one per n insertions (for which "n")?


Your profile looks very surprising:

  I would expect that for a single insertion, typically
  one persistent object (the bucket where the insertion takes place)
  is changed. About every 15 inserts, 3 objects are changed (the bucket
  is split) about every 15*125 inserts, 5 objects are changed
  (split of bucket and its container).
  But the mean value of objects changed in a transaction is 20
  in your profile.
  The changed objects typically have about 65 subobjects. This
  fits with "OOBucket"s.


Lookup times:

0.23 s would be 230 ms not 23 ms.

The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
BTree implementation itself. Lookup time is proportional to
the tree depth, which ideally would be O(log(n)). While BTrees
are not necessarily balanced (and therefore the depth may be larger
than logarithmic) it is not easy to obtain a severely unbalanced
tree by insertions only.
Other factors must have contributed to this drop: swapping, cache too small,
garbage collections...

Furthermore, the lookup times for your smaller BTrees are far too
good -- fetching any object from disk takes in the order of several
ms (2 to 20, depending on your disk).
This means that the lookups for your smaller BTrees have
typically been served directly from the cache (no disk lookups).
With your large BTree disk lookups probably became necessary.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev