[ZODB-Dev] Re: What makes the ZODB slow?

2006-06-27 Thread Chris Withers

Florent Guillaume wrote:

Chris Withers wrote:

Florent Guillaume wrote:
I can comment, I have a big brain too: the code in the catalog uses 
per-connection series of keys, so no conflicts arise.


Really? I thought they were per-thread... wasn't aware that each 
thread was tied to one connection indefinitely... I thought the 
connections were pooled and assigned to threads on an ad-hoc basis?


The series of keys are stored in a _v_ attribute which is 
per-connection. And a connection is never used by more that one thread 
at a time.


Yep, I think you're right, I'd be happier still if one of the authors of 
that code piped up in agreement ;-)


Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-27 Thread Chris Withers

Dieter Maurer wrote:

PostGres does use looks, lots of them and for different purposes.


Could ZODB use locks to gain a similar performance boost?


The only thing for which Postgres does not use locks is reading.
For this is uses MVCC (which we meanwhile adapted for the ZODB
to get rid of ReadConflictErrors).


Right...


And even when locks are used, conflicts arise (they take on the
form of deadlocks). I have seen several of them with Postgres
-- not as deadlocks but as concurrent update failed.


Ah good, it's not just us then ;-)


Most of our ConflictErrors come from the session machinery -- because
conflict resolution works there only in a very limited way
(due to limited history availability).


Would having more history help?


Of the rest, 147 of the 177 are either Products.Transience.Transience.Increaser
or Products.Transience.Transience.Length2


Yes, these are our hits -- despite the fact that our increaser
is much more intelligent (and increases only rarely and not 
on each access)


Hmmm, mind if I commit that increaser to the trunk?
There's a ZF board meeting some time soon after which you should get 
your official invite to become a committer member...


cheers,

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-27 Thread Dieter Maurer
Chris Withers wrote at 2006-6-27 09:56 +0100:
Dieter Maurer wrote:
 PostGres does use looks, lots of them and for different purposes.

Could ZODB use locks to gain a similar performance boost?

Maybe, but it would be a really big change...


However, as I explained in an earlier message, the major
speed difference does *not* come from optimistic versus
pessimistic concurrency control (the optimistic approach
is usually more efficient) but from:

   1. more efficient storage for highly structured data

   2. relational databases support a limited set of
  datatypes (tables, indexes) and know the behaviour.
  Operations therefore can be executed by the server.

  Object oriented databases, on the other hand, usually
  support an unlimited number of datatypes where
  the behaviour lives in the applications and
  the server is stupid.

  This causes high volumes of data to be exchanged
  between the server and the clients

   3. (unlike Andreas' feeling) the typical ZODB operation
  modify much more objects than apparently similar
  Postgres operations.

  If for example 10 Zope objects are modified and
  this cause the full text indexes to be updated
  then this can cause more modifications than
  the update of hundreds of Postgres rows
  (as such rows cannot contain mass data -- due to the restriction
  to simple types).

...
 Most of our ConflictErrors come from the session machinery -- because
 conflict resolution works there only in a very limited way
 (due to limited history availability).

Would having more history help?

Sure.

 ...
 Yes, these are our hits -- despite the fact that our increaser
 is much more intelligent (and increases only rarely and not 

Hmmm, mind if I commit that increaser to the trunk?

It's part of a proprietary extension product.
But, I can ask whether I can move over the essence to the
Zope core.

There's a ZF board meeting some time soon after which you should get 
your official invite to become a committer member...

Very fine!



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-26 Thread Florent Guillaume

On 26 Jun 2006, at 15:02, Chris Withers wrote:

Florent Guillaume wrote:

BTrees perform best when keys' prefixes are randomly distributed.
So if your application generates keys like 'foo001', 'foo002',...  
you'll get lots of conflicts. Same for consecutive integers in  
IOBTree.


Tempted to call bullshit on this, since there's code in the catalog  
to specifically assign series of keys...
...of course, that code may be evil, and people with bigger brains  
(hi Tim/Jeremy/Jim!) would have to comment..


I can comment, I have a big brain too: the code in the catalog uses  
per-connection series of keys, so no conflicts arise.


Florent

--
Florent Guillaume, Nuxeo (Paris, France)   Director of RD
+33 1 40 33 71 59   http://nuxeo.com   [EMAIL PROTECTED]



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-26 Thread Chris Withers

Andreas Jung wrote:



BTrees perform best when keys' prefixes are randomly distributed.
So if your application generates keys like 'foo001', 'foo002',... you'll
get lots of conflicts. Same for consecutive integers in IOBTree.


Tempted to call bullshit on this, since there's code in the catalog to
specifically assign series of keys...


Calm down 


Oh, sorry, forgot some smilies ;-)

Don't worry, perfectly calm here *grinz*

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-24 Thread Dieter Maurer
Roché Compaan wrote at 2006-6-23 19:04 +0200:
 ...
In a test where one commits an instance of a Persistent
subclass that have only 2 string attributes, 300 objects per second are
created on average. Writing the exact same strings to a two column table
in an RDBMS, yields more than 3000 records per second including indexing
of the data.

This largely is the fault of fsync. It tends to be extremely
slow on many platforms. There was an interesting poll for
fsync timings in this mailing list (about 1 year or so ago).


The ZODB uses fsync once per transaction. Apparently,
many relational databases do it less often and therefore
achieve a much higher transaction rate.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Jean Jordaan
 The ZODB is actually very fast. [...]
 
 So you're probably observing slowness in the frameworks on top of it.

I'll believe this anytime :-]

In our case, a transaction may be a workflow state change on say 50 objects.
Two or three people try a transaction like that within a couple of seconds
of one another, and ConflictErrors crop up.

In a log with 402 ConflictErrors, 225 are on BTrees (_IIBTree.IITreeSet,
_IOBTree.IOBucket, _OOBTree.OOBTree, _OOBTree.OOBucket all feature). We
assume these all relate to catalog indexing.

Of the rest, 147 of the 177 are either Products.Transience.Transience.Increaser
or Products.Transience.Transience.Length2

The role the framework (Plone, unsurprisingly) is playing in this case, is
that it leans hard on the catalog during a transaction lasting a number of
seconds.

To mitigate this, we want to create a savepoint and then commit more often
while iterating and changing workflow, rolling back to the savepoint if
necessary.

-- 
jean
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Florent Guillaume

Jean Jordaan wrote:

The ZODB is actually very fast. [...]

So you're probably observing slowness in the frameworks on top of it.


I'll believe this anytime :-]

In our case, a transaction may be a workflow state change on say 50 objects.
Two or three people try a transaction like that within a couple of seconds
of one another, and ConflictErrors crop up.

In a log with 402 ConflictErrors, 225 are on BTrees (_IIBTree.IITreeSet,
_IOBTree.IOBucket, _OOBTree.OOBTree, _OOBTree.OOBucket all feature). We
assume these all relate to catalog indexing.

Of the rest, 147 of the 177 are either Products.Transience.Transience.Increaser
or Products.Transience.Transience.Length2

The role the framework (Plone, unsurprisingly) is playing in this case, is
that it leans hard on the catalog during a transaction lasting a number of
seconds.

To mitigate this, we want to create a savepoint and then commit more often
while iterating and changing workflow, rolling back to the savepoint if
necessary.



BTrees perform best when keys' prefixes are randomly distributed.
So if your application generates keys like 'foo001', 'foo002',... you'll 
get lots of conflicts. Same for consecutive integers in IOBTree.


Florent

--
Florent Guillaume, Nuxeo (Paris, France)   Director of RD
+33 1 40 33 71 59   http://nuxeo.com   [EMAIL PROTECTED]
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Andreas Jung



--On 23. Juni 2006 17:51:35 +0200 Florent Guillaume [EMAIL PROTECTED] wrote:





BTrees perform best when keys' prefixes are randomly distributed.
So if your application generates keys like 'foo001', 'foo002',... you'll
get lots of conflicts. Same for consecutive integers in IOBTree.




hm..are you sure about that?

-aj

pgp75ozEv05Vw.pgp
Description: PGP signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Roché Compaan
On Fri, 2006-06-23 at 15:11 +0200, Florent Guillaume wrote:
  I often daydream of a ZODB that will one day have such great performance
  that it won't be necessary to adopt a hybrid backend. I know there is a
  huge difference between objects and records in an RDBMS, but in an
  attempt to understand more, I want to know what makes the ZODB so much
  slower than a relational database when writing a lot? Is it possible to
  speed it up in any way? 
  
  Other questions that come to mind:
  
  What overhead does undo add to performance?
  Can state be serialised more economically to reduce disk IO?
  Is the ZODB really slow, or is it just Zope and Plone or grand object
  frameworks built on top it that make it appear slow? (In all my
  benchmarks this is shown to be mostly true)
 
 The ZODB is actually very fast. It has one drawback, which is that 
 concurrent writes are resolved only for class designed for that (namely 
 BTrees), otherwise it's left up to the application to deal with it when 
 it receives a ConflictError.
 
 So you're probably observing slowness in the frameworks on top of it.

This is not really the fundamental explanation I was fishing for, and I
don't think that you are entirely right.

I don't think one can call the ZODB fast (I hope to some day). It might
be fast in it's handling of hierarchical data or reading lots of
objects, but I won't exactly call it fast. Just compare the speed new
objects are created in the ZODB, with the speed of records being created
in an RDMBS. In a test where one commits an instance of a Persistent
subclass that have only 2 string attributes, 300 objects per second are
created on average. Writing the exact same strings to a two column table
in an RDBMS, yields more than 3000 records per second including indexing
of the data. In the ZODB I still have to index data which will add
additional overhead. Adding more columns to the SQL table and writing
more data to it, doesn't hurt performance either.

The above test most probably doesn't compare apples with apples, but
maybe in pointing out why not, more fundamental differences become
clear. Maybe the fundamental difference is that pickles of objects have
a bigger footprint and yield to more disk IO, or most of the ZODB is
implemented in Python. I don't know, and I'm still curious.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Florent Guillaume


On 23 Jun 2006, at 17:55, Andreas Jung wrote:
--On 23. Juni 2006 17:51:35 +0200 Florent Guillaume [EMAIL PROTECTED]  
wrote:

BTrees perform best when keys' prefixes are randomly distributed.
So if your application generates keys like 'foo001', 'foo002',...  
you'll

get lots of conflicts. Same for consecutive integers in IOBTree.


hm..are you sure about that?


It all depends on the concurrency for the use of these consecutive  
ids really.


The problem is bucket splits. A bucket split cannot be resolved by  
conflict resolution code of BTrees.


Let's say B is the size of a bucket and you have N leaf buckets in  
the whole BTree.
If you use consecutive ids, you'll get a bucket split every B/2  
inserts (assuming buckets are half-filled on average).
If you use random ids, you'll get a bucket split on average every N*B/ 
2 inserts.

All this roughly (I'm ignoring details like internal nodes).

If two processes concurrently use sequential ids from the same pool  
at the same time, I'd say there one in B chances of getting a  
conflict error. It's only one in (N*B)^2 if the ids are random.


All back-of-the-envelope calculations of course...

Florent

--
Florent Guillaume, Nuxeo (Paris, France)   Director of RD
+33 1 40 33 71 59   http://nuxeo.com   [EMAIL PROTECTED]



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: What makes the ZODB slow?

2006-06-23 Thread Dieter Maurer
Jean Jordaan wrote at 2006-6-23 16:24 +0200:
 ... write conflicts by large transactions ...
To mitigate this, we want to create a savepoint and then commit more often
while iterating and changing workflow, rolling back to the savepoint if
necessary.

I fear this will not work -- at least not when you mean the
ZODB savepoints. The ZODB savepoints are on the sub-transaction level.
Write conflicts happen at the transaction boundary.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev