subject:"\[ZODB\-Dev\] Re\: ZODB Benchmarks"

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-26 Thread Dieter Maurer

Sean Allen wrote at 2008-3-25 15:23 -0400:
> ...
>
>On Mar 25, 2008, at 2:54 PM, Dieter Maurer wrote:
>> Benji York wrote at 2008-3-25 14:24 -0400:
>>> ... commit contentions ...
 Almost surely there are several causes that all can lead to  
 contention.

 We already found:

  *  client side causes (while the client helds to commit lock)

- garbage collections (which can block a client in the order of
  10 to 20 s)
>...
>> A reconfiguration of the garbage collector helped us with this one
>> (the standard configuration is not well tuned to processes with
>> large amounts of objects).
>
>what'd you do?

# reconfigure garbage collector
#  generation 0 GC at "(allocated - freed) == 7.000"; analyse 7.000 objects
#  generation 1 GC at "(allocated - freed) == 140.000"; analyse 140.000 objects
#  generation 2 GC at "(allocated - freed) == 1.400.000"; analyse all objects
import gc; gc.set_threshold(7000, 20, 10)



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Sean Allen



On Mar 25, 2008, at 2:54 PM, Dieter Maurer wrote:

Benji York wrote at 2008-3-25 14:24 -0400:

... commit contentions ...
Almost surely there are several causes that all can lead to  
contention.


We already found:

 *  client side causes (while the client helds to commit lock)

   - garbage collections (which can block a client in the order of
 10 to 20 s)


Interesting.  Perhaps someone might enjoy investigating turning off
garbage collection during commits.


A reconfiguration of the garbage collector helped us with this one
(the standard configuration is not well tuned to processes with
large amounts of objects).


what'd you do?

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Dieter Maurer

Benji York wrote at 2008-3-25 14:24 -0400:
> ... commit contentions ...
>> Almost surely there are several causes that all can lead to contention.
>> 
>> We already found:
>> 
>>   *  client side causes (while the client helds to commit lock)
>>   
>> - garbage collections (which can block a client in the order of
>>   10 to 20 s)
>
>Interesting.  Perhaps someone might enjoy investigating turning off 
>garbage collection during commits.

A reconfiguration of the garbage collector helped us with this one
(the standard configuration is not well tuned to processes with
large amounts of objects).

> 
>> - invalidation processing, espicially ZEO ClientCache processing
>
>Interesting.  Not knowing much about how invalidations are handled, I'm 
>curious where the slow-down is.  Do you have any more detail?

Not many:

We have a component called RequestMonitor which periodically
checks for long running requests and logs the corresponding stack
traces.
This monitor very often sees requests (holding the commit lock)
which are in "ZEO.cache.FileCache.settid".

As the monitor runs asynchronously with the observed threads,
the probability of an observation in a given function
depends on how long the thread is inside this function (total
time, i.e. visits times mean time per visit).
>From this, we can conclude that a significant time is spend in
"settid".



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Benji York


Dieter Maurer wrote:

We do not yet precisely the cause of our commit contentions.


Hard to tell what'll make them better then. ;)


Almost surely there are several causes that all can lead to contention.

We already found:

  *  client side causes (while the client helds to commit lock)
  
- garbage collections (which can block a client in the order of

  10 to 20 s)


Interesting.  Perhaps someone might enjoy investigating turning off 
garbage collection during commits.



- NFS operations (which can take up to 27 s in our setup -- for
  still unknown reasons)


Not much ZODB can do about that. ;)


- invalidation processing, espicially ZEO ClientCache processing


Interesting.  Not knowing much about how invalidations are handled, I'm 
curious where the slow-down is.  Do you have any more detail?



  *  server side causes

- commit lock hold during copy phase of pack

- IO trashing during the reachability analysis in pack


The new pack code should help quite a bit with the above (if you're 
saying what I think you're saying).



- non deterministic server side IO anomalities
  (IO suddently takes several times longer than usual -- for still
  unknown reasons)


Curious.
--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Dieter Maurer

Benji York wrote at 2008-3-25 09:40 -0400:
>Christian Theune wrote:
>> I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
>> should look into a technique called `group commit` to get rid of the "commit
>> contention".
> ...
>Summary: fsync is slow (and the cornerstone of most commit steps), so 
>try to gather up a small batch of commits to do all at once (with only 
>one call to fsync).

Our commit contention definitely is not caused by "fsync".
Our "fsync" is quite fast. If only "fsync" would need to be considered,
we could easily process at least 1.000 transactions per second -- but
actually with 10 transactions per second we get contentions a few times
per week.



We do not yet precisely the cause of our commit contentions.
Almost surely there are several causes that all can lead to contention.

We already found:

  *  client side causes (while the client helds to commit lock)
  
- garbage collections (which can block a client in the order of
  10 to 20 s)

- NFS operations (which can take up to 27 s in our setup -- for
  still unknown reasons)

- invalidation processing, espicially ZEO ClientCache processing

  *  server side causes

- commit lock hold during copy phase of pack

- IO trashing during the reachability analysis in pack

- non deterministic server side IO anomalities
  (IO suddently takes several times longer than usual -- for still
  unknown reasons)
> Somewhat like Nagle's algorithm, but for fsync.
>
>The kicker is that OSs and hardware often lie about fsync (and it's 
>therefore fast) and good hardware (disk arrays with battery backed write 
>cache) already make fsync pretty fast.
>
>Not to suggest that group commit wouldn't speed things up, but it would 
>seem that the technique will make the largest improvement for people 
>that are using a non-lying fsync on inappropriate hardware.
>-- 
>Benji York
>Senior Software Engineer
>Zope Corporation

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Benji York


Christian Theune wrote:

I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
should look into a technique called `group commit` to get rid of the "commit
contention".

Does anybody know this technique already and maybe has a pointer for me?


I'd never heard the phrase until reading your message, but I think I got 
a pretty clear picture from 
http://forums.mysql.com/read.php?22,53854,53854#msg-53854 and 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01696.php.


Summary: fsync is slow (and the cornerstone of most commit steps), so 
try to gather up a small batch of commits to do all at once (with only 
one call to fsync).  Somewhat like Nagle's algorithm, but for fsync.


The kicker is that OSs and hardware often lie about fsync (and it's 
therefore fast) and good hardware (disk arrays with battery backed write 
cache) already make fsync pretty fast.


Not to suggest that group commit wouldn't speed things up, but it would 
seem that the technique will make the largest improvement for people 
that are using a non-lying fsync on inappropriate hardware.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Christian Theune

Hi,

On Fri, Mar 21, 2008 at 09:08:28PM +0100, Dieter Maurer wrote:
> Chris Withers wrote at 2008-3-20 22:22 +:
> >Roché Compaan wrote:
> >> Not yet, they are very time consuming. I plan to do the same tests over
> >> ZEO next to determine what overhead ZEO introduces.
> >
> >Remember to try introducing more app servers and see where the 
> >bottleneck comes ;-)
> 
> We have seen "commit contention" with lots (24) of zeo clients
> and a high write rate application (allmost all requests write to
> the ZODB).

I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
should look into a technique called `group commit` to get rid of the "commit
contention".

Does anybody know this technique already and maybe has a pointer for me?

Christian

-- 
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-21 Thread Dieter Maurer

Chris Withers wrote at 2008-3-20 22:22 +:
>Roché Compaan wrote:
>> Not yet, they are very time consuming. I plan to do the same tests over
>> ZEO next to determine what overhead ZEO introduces.
>
>Remember to try introducing more app servers and see where the 
>bottleneck comes ;-)

We have seen "commit contention" with lots (24) of zeo clients
and a high write rate application (allmost all requests write to
the ZODB).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2008-03-20 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chris Withers wrote:
> Roché Compaan wrote:
>> Not yet, they are very time consuming. I plan to do the same tests over
>> ZEO next to determine what overhead ZEO introduces.
> 
> Remember to try introducing more app servers and see where the 
> bottleneck comes ;-)
> 
> Am I right in thinking the storage servers is still essentially single 
> threaded?

Yes, but they are normally network / disk bound, rather than CPU bound,
which makes that less of an issue.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  [EMAIL PROTECTED]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH4xqu+gerLs4ltQ4RApy5AJ9D3nzvv9329p5hdeZZSUj+smbPrQCfe5nd
8tInjsl71RrpSVOD7DVADYE=
=129a
-END PGP SIGNATURE-

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-20 Thread Chris Withers


Roché Compaan wrote:

Not yet, they are very time consuming. I plan to do the same tests over
ZEO next to determine what overhead ZEO introduces.


Remember to try introducing more app servers and see where the 
bottleneck comes ;-)


Am I right in thinking the storage servers is still essentially single 
threaded?


cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-19 Thread Roché Compaan

On Thu, 2008-03-20 at 00:00 -0400, Manuel Vazquez Acosta wrote:
> Roché Compaan wrote:
> > On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
> >> I'll update my blog post with the final stats and let you know when it
> >> is ready.
> >>
> > 
> > I'll have to keep running these tests because the more I run them the
> > faster the ZODB becomes ;-) Would you have guessed that the ZODB is
> > faster at both insertion and lookups than Postgres?
> > 
> > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited
> > 
> > Lookups are even faster then what I originally reported. Lookup times
> > averages at 2 milliseconds (0.002s) on 10 million objects. 
> > 
> > I think somebody else should run these tests as well and validate the
> > methodology behind them, otherwise I'm spreading lies.
> > 
> 
> Roché,
> 
> I'm very interested in your results. I'm assessing whether or not
> implement an application with ZODB. I have had a previous good
> experience, but was in a minor project. Now I need very fast lookups and
> retrievals, possibly distributed DB system and as easy as ZODB.
> 
> Have done more tests recently?

Not yet, they are very time consuming. I plan to do the same tests over
ZEO next to determine what overhead ZEO introduces.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-19 Thread Manuel Vazquez Acosta

Roché Compaan wrote:
> On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
>> I'll update my blog post with the final stats and let you know when it
>> is ready.
>>
> 
> I'll have to keep running these tests because the more I run them the
> faster the ZODB becomes ;-) Would you have guessed that the ZODB is
> faster at both insertion and lookups than Postgres?
> 
> http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited
> 
> Lookups are even faster then what I originally reported. Lookup times
> averages at 2 milliseconds (0.002s) on 10 million objects. 
> 
> I think somebody else should run these tests as well and validate the
> methodology behind them, otherwise I'm spreading lies.
> 

Roché,

I'm very interested in your results. I'm assessing whether or not
implement an application with ZODB. I have had a previous good
experience, but was in a minor project. Now I need very fast lookups and
retrievals, possibly distributed DB system and as easy as ZODB.

Have done more tests recently?

On the other hand I'm wondering too how this relate to Zope.

Regards,
Manuel.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Izak Burger


Benji York wrote:
If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.


While we're on that subject. We recently had a box that would take 
strain for almost no reason. You'd copy a bigish file from one place to 
another and the load average would just soar as the various zope and zeo 
instances tried to get to the disk. Turns out this machine used the 
anticipatory io scheduler, which really messes things up. We changed it 
to deadline, like so:


  echo deadline > /sys/block/sda/queue/scheduler

Performance is a lot better now. Our (not very scientific) tests shows 
that deadline is also a little better than cfq for running zope.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Benji York


David Pratt wrote:

Hi Benji. Have you any settings to recommend or use a default. Many thanks.


For benchmarking?  No.

Too high and you'll spend a bunch of time swapping to free up space for 
disk cache; too low and you may not have a large enough disk cache to be 
effective.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread David Pratt


Hi Benji. Have you any settings to recommend or use a default. Many thanks.

Regards,
David

Benji York wrote:

Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
Maybe if you set up a ZODB cache that allows just over 10 million 
objects, the lookup time will drop to microseconds.  You might need a 
lot of RAM to do that, though.


Maybe, but somehow I think that disk IO will prevent this. I'll check.


If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Benji York


Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
Maybe if you set up a ZODB cache that allows just over 10 million 
objects, the lookup time will drop to microseconds.  You might need a 
lot of RAM to do that, though.


Maybe, but somehow I think that disk IO will prevent this. I'll check.


If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Shane Hathaway


Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
On a related topic, you might be interested in the RelStorage 
performance charts I just posted.  Don't take them too seriously, but
I 
think the charts are useful.


http://shane.willowrise.com/archives/relstorage-10-and-measurements/


One question, if you run the test with concurrent threads, do each
thread insert a 100 objects (or a 1 for the second test)? 


Yes.


Is this test available in SVN somewhere?


It's in the RelStorage 1.0 release as well as SVN.  It's called 
relstorage/tests/speedtest.py.


Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
> On a related topic, you might be interested in the RelStorage 
> performance charts I just posted.  Don't take them too seriously, but
> I 
> think the charts are useful.
> 
> http://shane.willowrise.com/archives/relstorage-10-and-measurements/

One question, if you run the test with concurrent threads, do each
thread insert a 100 objects (or a 1 for the second test)? 

Is this test available in SVN somewhere?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Lennart Regebro

On Tue, Mar 4, 2008 at 10:16 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote:
>  Not if you're only retrieving intermediate information.

Sure. And my point is that in a typical websetting, you don't. You
either retrieve data to be displayed, or you insert data from a HTTP
post. Massaging data from one place of the database to another place
in the database is usually nothing you do on a per-request basis. And
if you do, then maybe you shouldn't. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread David Pratt

Hi Roche. I figured this out once and it was included in PGStorage so it 
should be in relstorage also. Take a look at get_db_size method in 
postgres adapter. relstorage is in the zope repository.


Regards,
David

Roché Compaan wrote:



- How much disk space does each database consume when there are 10M objects?


ZODB: 19GB

How do you check the size of a Postgres database?

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan

On Tue, 2008-03-04 at 23:00 +0100, Bernd Dorn wrote:
> On 04.03.2008, at 22:16, Shane Hathaway wrote:
> 
> > Lennart Regebro wrote:
> >> On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway  
> >> <[EMAIL PROTECTED]> wrote:
> >>> - Did you use optimal methods of retrieval in Postgres?  It is
> >>> frequently not necessary to pull the data into the application.   
> >>> Copying
> >>> to another table could be faster than fetching rows.
> >> But is that relevant in this case? Retrieval must reasonably really
> >> retrieve the data, not just move it around. :)
> >
> > Not if you're only retrieving intermediate information.  When you  
> > write an application against a relational database, a lot of the  
> > intermediate information does not need to be exposed to Python,  
> > helping performance significantly.
> 
> yes this is the major benefit when using a relational database over  
> zodb, because zodb has no server side query language,

This certainly makes some queries faster in an RDBMS. My first goal was
to determine the speed of the most basic operation like insert and
lookup.

>  so the whole  
> lookup insert comparison does not reflect real world issues.

I disagree. It certainly tests some real world use cases. Most notably
the use case where you have a very large user base inserting content at
a very high rate into the ZODB. I think what is unknown at this stage is
what the penalty would be when using ZEO. But you can only know this if
you know how fast direct interaction with the ZODB is. Not all
applications require ZEO either.

> for example in one of our applications we have to calculate neighbours  
> of users, based on books they have in their bookshelfs. with about  
> 1 users and each of them having an average of 100-500 books out of  
> ca. 1 million, the calculation of the neighbours takes seconds when  
> you have to calculate this on the client, by getting all indexes etc.  
> we switched to sql and wrote a single sql statement that does exactly  
> the same comparison which now takes about 300ms.
> 
> your comparisons would only be accurate if comparing relstorage with  
> filestorage over zeo, because in this case there is no server side  
> query possible on object attributes . it would be interesting to look  
> at performance when having 4-10 zodb clients and then compare zeo/ 
> filestorage against relstorage with postgres.

Hopefully the tests are accurate in comparing the speed for basic
operations like insertion and lookup. They might be more *relevant* if
one performs the same tests using ZEO.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
> Roché Compaan wrote:
> > On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
> >> I'll update my blog post with the final stats and let you know when it
> >> is ready.
> >>
> > 
> > I'll have to keep running these tests because the more I run them the
> > faster the ZODB becomes ;-) Would you have guessed that the ZODB is
> > faster at both insertion and lookups than Postgres?
> > 
> > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited
> 
> For some workloads, ZODB is definitely faster.  However, I think your 
> analysis needs to provide more detail:

I posted some of the details earlier in this thread but I'll put them up
on the post as well.

> - How much RAM did ZODB and Postgres consume during the tests?

Don't know, will check.

> - How often are you committing a transaction?

100 inserts per transaction.

> - Did you use optimal methods of insertion in Postgres, such as COPY? 
> Also note that a standard way to insert a lot of data into a relational 
> database is to temporarily drop indexes and re-create them after 
> insertion.  Your original test may be more valid than you thought.

I don't think that this describes the typical interaction between an
application and a database. Usually records will be added to Postgres
using INSERT.

A goal of the benchmarks is to understand the limitations for
applications that use the ZODB and to challenge the idea that an RDBMS
should be used for applications that, in naive terms, require a "big"
database that can write fast.

> - Did you use optimal methods of retrieval in Postgres?  It is 
> frequently not necessary to pull the data into the application.  Copying 
> to another table could be faster than fetching rows.

I realise this. The only thing the lookup stats tell me is that lookups
in the ZODB don't need drastic optimisation, it is already very fast.

> - What is the ZODB cache size?  How much does the speed change as you 
> change the cache size?

A lot! With the default cache size the insert rate is a mere 50
inserts/second at around 10 million objects. I used a cache size of
10 in the benchmarks.

> - How much disk space does each database consume when there are 10M objects?

ZODB: 19GB

How do you check the size of a Postgres database?

> 
> > Lookups are even faster then what I originally reported. Lookup times
> > averages at 2 milliseconds (0.002s) on 10 million objects. 
> 
> Maybe if you set up a ZODB cache that allows just over 10 million 
> objects, the lookup time will drop to microseconds.  You might need a 
> lot of RAM to do that, though.

Maybe, but somehow I think that disk IO will prevent this. I'll check.

> > I think somebody else should run these tests as well and validate the
> > methodology behind them, otherwise I'm spreading lies.
> 
> You're not far from something interesting.
> 
> On a related topic, you might be interested in the RelStorage 
> performance charts I just posted.  Don't take them too seriously, but I 
> think the charts are useful.
> 
> http://shane.willowrise.com/archives/relstorage-10-and-measurements/

Thanks for all your questions. I'll certainly post the missing detail on
the web and investigate some of the things that might affect
performance.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Bernd Dorn



On 04.03.2008, at 22:16, Shane Hathaway wrote:


Lennart Regebro wrote:
On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway  
<[EMAIL PROTECTED]> wrote:

- Did you use optimal methods of retrieval in Postgres?  It is
frequently not necessary to pull the data into the application.   
Copying

to another table could be faster than fetching rows.

But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)


Not if you're only retrieving intermediate information.  When you  
write an application against a relational database, a lot of the  
intermediate information does not need to be exposed to Python,  
helping performance significantly.


yes this is the major benefit when using a relational database over  
zodb, because zodb has no server side query language, so the whole  
lookup insert comparison does not reflect real world issues.


for example in one of our applications we have to calculate neighbours  
of users, based on books they have in their bookshelfs. with about  
1 users and each of them having an average of 100-500 books out of  
ca. 1 million, the calculation of the neighbours takes seconds when  
you have to calculate this on the client, by getting all indexes etc.  
we switched to sql and wrote a single sql statement that does exactly  
the same comparison which now takes about 300ms.


your comparisons would only be accurate if comparing relstorage with  
filestorage over zeo, because in this case there is no server side  
query possible on object attributes . it would be interesting to look  
at performance when having 4-10 zodb clients and then compare zeo/ 
filestorage against relstorage with postgres.







Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


--
Lovely Systems, senior developer

phone: +43 5572 908060, fax: +43 5572 908060-77
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria
skype: bernd.dorn





smime.p7s
Description: S/MIME cryptographic signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Shane Hathaway


Lennart Regebro wrote:

On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote:

 - Did you use optimal methods of retrieval in Postgres?  It is
 frequently not necessary to pull the data into the application.  Copying
 to another table could be faster than fetching rows.


But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)


Not if you're only retrieving intermediate information.  When you write 
an application against a relational database, a lot of the intermediate 
information does not need to be exposed to Python, helping performance 
significantly.


Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Lennart Regebro

On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote:
>  - Did you use optimal methods of insertion in Postgres, such as COPY?
>  Also note that a standard way to insert a lot of data into a relational
>  database is to temporarily drop indexes and re-create them after
>  insertion.  Your original test may be more valid than you thought.
>
>  - Did you use optimal methods of retrieval in Postgres?  It is
>  frequently not necessary to pull the data into the application.  Copying
>  to another table could be faster than fetching rows.

But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Shane Hathaway

Roché Compaan wrote:

On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:

I'll update my blog post with the final stats and let you know when it
is ready.

I'll have to keep running these tests because the more I run them the
faster the ZODB becomes ;-) Would you have guessed that the ZODB is
faster at both insertion and lookups than Postgres?

http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited

For some workloads, ZODB is definitely faster. However, I think your
analysis needs to provide more detail:

- How much RAM did ZODB and Postgres consume during the tests?

- How often are you committing a transaction?

- Did you use optimal methods of insertion in Postgres, such as COPY?
Also note that a standard way to insert a lot of data into a relational
database is to temporarily drop indexes and re-create them after
insertion. Your original test may be more valid than you thought.

- Did you use optimal methods of retrieval in Postgres? It is
frequently not necessary to pull the data into the application. Copying
to another table could be faster than fetching rows.

- What is the ZODB cache size? How much does the speed change as you
change the cache size?

- How much disk space does each database consume when there are 10M objects?

Lookups are even faster then what I originally reported. Lookup times
averages at 2 milliseconds (0.002s) on 10 million objects.

Maybe if you set up a ZODB cache that allows just over 10 million
objects, the lookup time will drop to microseconds. You might need a
lot of RAM to do that, though.

I think somebody else should run these tests as well and validate the
methodology behind them, otherwise I'm spreading lies.

You're not far from something interesting.

On a related topic, you might be interested in the RelStorage
performance charts I just posted. Don't take them too seriously, but I
think the charts are useful.

http://shane.willowrise.com/archives/relstorage-10-and-measurements/

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list - ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Roché Compaan

On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
> I'll update my blog post with the final stats and let you know when it
> is ready.
> 

I'll have to keep running these tests because the more I run them the
faster the ZODB becomes ;-) Would you have guessed that the ZODB is
faster at both insertion and lookups than Postgres?

http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited

Lookups are even faster then what I originally reported. Lookup times
averages at 2 milliseconds (0.002s) on 10 million objects. 

I think somebody else should run these tests as well and validate the
methodology behind them, otherwise I'm spreading lies.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-24 Thread Roché Compaan

I made a "lovely" mistake in my first round of benchmarks. Lovely, in
that it puts the ZODB in a much better light. When I first ran the
Postgres test, I neglected to put an index on the key field of the
table. I only added the index before I timed lookups on Postgres but
forgot to retest insertion. Since the key in the ZODB is effectively
indexed I think the test is only fair if the corresponding key in
Postgres is indexed.

Retesting insertion of a million records into Postgres with the index on
the key field revealed that Postgres performance deteriorates
logarithmically at roughly the same rate as the ZODB. After about 10
million insertions the ZODB was doing 250 inserts per second. After
adding the index on the table, Postgres was doing only slightly better
but not above 300 inserts per second.

I'll update my blog post with the final stats and let you know when it
is ready.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-08 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-7 21:44 +0200:
> ...
>There are use cases where having a container in the ZODB that can handle
>large volumes and maintain a high insertion rate would be very
>convenient. An example of such a use case would be a site with millions
>of members where each member has their own folder containing different
>content types. The rate at which new members register is very high as
>well

I do not believe that the insertion rate the ZODB can handle now
would not be sufficient to handle this use case.

I do not have your timings present, but from our installation
I know that the ZODB can handle 10 transactions per second.
This would mean about 36.000 per day (10 hour days)
and about 1 million in a month.

>so the folder needs to handle insertions quickly. In this use case
>you are not dealing with structured data. If members in a site with such
>large volumes start to generate content, indexes in the ZODB become
>problematic too because of the slow rate of insertion.

We have several write intensive applications with storages
in the order of 10 to 20 GB and 10 to 20 millions objects
-- and have not yet seen problems with the insertion rate.

We do see other problems (notably commit contention)
but these problems cannot be solved by an increased insertion
rate.

>And it this point
>you start to stuff everything in relational database and the whole
>experience becomes painful ...

We speak again when you observe a concrete problem in a real
installation caused by limited insertion rate.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Roché Compaan

On Thu, 2008-02-07 at 20:26 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-7 21:21 +0200:
> > ...
> >So if I asked you to build a data structure for the ZODB that can do
> >insertions at a rate comparable to Postgres on high volumes, do you
> >think that it can be done?
> 
> If you need a high write rate, the ZODB is probably not optimal.
> Ask yourself whether it is not better to put such high frequency write
> data directly into a relational database.
> 
> Whenever you have large amounts of highly structured data,
> a relational database is necessary more efficient than the ZODB.

I know it is not optimal for high write scenarios, but I'm asking if it
is possible to build a data structure for the ZODB that can do
insertions quickly.

There are use cases where having a container in the ZODB that can handle
large volumes and maintain a high insertion rate would be very
convenient. An example of such a use case would be a site with millions
of members where each member has their own folder containing different
content types. The rate at which new members register is very high as
well so the folder needs to handle insertions quickly. In this use case
you are not dealing with structured data. If members in a site with such
large volumes start to generate content, indexes in the ZODB become
problematic too because of the slow rate of insertion. And it this point
you start to stuff everything in relational database and the whole
experience becomes painful ...

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-7 21:21 +0200:
> ...
>So if I asked you to build a data structure for the ZODB that can do
>insertions at a rate comparable to Postgres on high volumes, do you
>think that it can be done?

If you need a high write rate, the ZODB is probably not optimal.
Ask yourself whether it is not better to put such high frequency write
data directly into a relational database.

Whenever you have large amounts of highly structured data,
a relational database is necessary more efficient than the ZODB.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Roché Compaan

On Thu, 2008-02-07 at 00:39 +0100, Dieter Maurer wrote:
> >If I understand correctly, for each insertion 3 calls are made to
> >"persistent_id"? This is still very far from the 66 I mentioned above?
> 
> You did not understand correctly.
> 
> You insert an entry. The insertion modifies (at least) one OOBucket.
> The "OOBucket" needs to be written back. For each of its entries
> (one is your new one, but there may be up to 29 others) 3
> "persistent_id" calls will happen.

Thanks I understand now.

So if I asked you to build a data structure for the ZODB that can do
insertions at a rate comparable to Postgres on high volumes, do you
think that it can be done? If so, would it not be worth investing time
and money into this? If not, why not?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-6 20:18 +0200:
>On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote:
>> Roché Compaan wrote at 2008-2-4 20:54 +0200:
>> > ...
>> >I don't follow? There are 2 insertions and there are 1338046 calls
>> >to persistent_id. Doesn't this suggest that there are 66 objects
>> >persisted per insertion? This seems way to high?
>> 
>> Jim told you that "persistent_id" is called for each object and not
>> only persistent objects.
>> 
>> An OOBucket contains up to 30 key value pairs, each of which
>> are subjected to a call to "persistent_id". In each of your pairs,
>> there is an additional persistent object. This means, you
>> should expect 3 calls to "persistent_id" for each pair in an "OOBucket".
>
>If I understand correctly, for each insertion 3 calls are made to
>"persistent_id"? This is still very far from the 66 I mentioned above?

You did not understand correctly.

You insert an entry. The insertion modifies (at least) one OOBucket.
The "OOBucket" needs to be written back. For each of its entries
(one is your new one, but there may be up to 29 others) 3
"persistent_id" calls will happen.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: RE : [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Dieter Maurer

Mignon, Laurent wrote at 2008-2-6 08:06 +0100:
>After a lot of tests and benchmark, my feeling is that the ZODB does not seem 
>suitable for systems managing many data stored in a plane hierarchy.
>The application that we currently develop is a business process management 
>system in opposition to a content management system. In order to guarantee the 
>performances necessary, we decided to no more use the ZODB. All data are now 
>stored in a relationnal database.

Roché's corrected timings indicate:

  The ZODB is significantly slower than Postgres for insertions
  but camparatively fast (slightly faster) on lookups.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Roché Compaan

On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-4 20:54 +0200:
> > ...
> >I don't follow? There are 2 insertions and there are 1338046 calls
> >to persistent_id. Doesn't this suggest that there are 66 objects
> >persisted per insertion? This seems way to high?
> 
> Jim told you that "persistent_id" is called for each object and not
> only persistent objects.
> 
> An OOBucket contains up to 30 key value pairs, each of which
> are subjected to a call to "persistent_id". In each of your pairs,
> there is an additional persistent object. This means, you
> should expect 3 calls to "persistent_id" for each pair in an "OOBucket".

If I understand correctly, for each insertion 3 calls are made to
"persistent_id"? This is still very far from the 66 I mentioned above?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-05 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-4 20:54 +0200:
> ...
>I don't follow? There are 2 insertions and there are 1338046 calls
>to persistent_id. Doesn't this suggest that there are 66 objects
>persisted per insertion? This seems way to high?

Jim told you that "persistent_id" is called for each object and not
only persistent objects.

An OOBucket contains up to 30 key value pairs, each of which
are subjected to a call to "persistent_id". In each of your pairs,
there is an additional persistent object. This means, you
should expect 3 calls to "persistent_id" for each pair in an "OOBucket".



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-04 Thread David Binger



On Feb 4, 2008, at 1:54 PM, Roché Compaan wrote:


I don't follow? There are 2 insertions and there are 1338046 calls
to persistent_id. Doesn't this suggest that there are 66 objects
persisted per insertion? This seems way to high?


It seems like there is some confusion about the correspondence between
"persisting an object" and calls to persistent_id().  The pickler makes
lots of calls to persistent_id() as it is making pickles.
In my mind, "persisting an object" means saving the new state of
an instance of Persistent.  When you insert a new persistent instance
in a BTree, you are "persisting" the new instance, the bucket/node
that holds the reference to the new instance, and in some cases,
some small number of other bucket/nodes that are changed as part of
the insertion.  That's it.  If you insert a bunch of things in one
commit(), the number of persistent instances committed is even  
smaller because

some buckets will get multiple changes in one write.

There are usually many calls to persistent_id() when one btree bucket is
pickled, but I would count that as 1 persistent object.














___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-04 Thread Roché Compaan

On Sun, 2008-02-03 at 22:05 +0100, Dieter Maurer wrote:
> The number of "persitent_id" calls suggests that a written
> persistent object has a mean value of 65 subobjects -- which
> fits well will OOBuckets.
> 
> However, when the profile is for commits with 100 insertions each,
> then the number of written persistent objects is far too small.
> In fact, we would expect about 200 persistent object writes per transaction:
> the 100 new persistent objects assigned plus about as many buckets
> changed by these insertions.

I don't follow? There are 2 insertions and there are 1338046 calls
to persistent_id. Doesn't this suggest that there are 66 objects
persisted per insertion? This seems way to high?

> > 
> >The keys that I lookup are completely random so it is probably the case
> >that the lookup causes disk lookups all the time. If this is the case,
> >is 230ms not still to slow?
> 
> Unreasonably slow in fact.
> 
> A tree with size 10**7 does likely not have a depth larger than 4
> (internal nodes should typically have at least 125 entries, leaves should have
> at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * 
> 10**6).
> Therefore, one would expect at most 4 disk accesses.
> 
> On my (6 year old) computer, a disk access can take up to 30 ms.

The lookup times I reported were wrong. There was a bug in the code that
reported the lookup time - the correct average lookup time for a BTree
with 10 million objects was an impressive 12 ms. For Postgres this was
14 ms.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-03 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-3 09:15 +0200:
> ...
>I have tried different commit intervals. The published results are for a
>commit interval of 100, iow 100 inserts per commit.
>
>> Your profile looks very surprising:
>> 
>>   I would expect that for a single insertion, typically
>>   one persistent object (the bucket where the insertion takes place)
>>   is changed. About every 15 inserts, 3 objects are changed (the bucket
>>   is split) about every 15*125 inserts, 5 objects are changed
>>   (split of bucket and its container).
>>   But the mean value of objects changed in a transaction is 20
>>   in your profile.
>>   The changed objects typically have about 65 subobjects. This
>>   fits with "OOBucket"s.
>
>It was very surprising to me too since the insertion is so basic. I
>simply assign a Persistent object with 1 string attribute that is 1K in
>size to a key in a OOBTree. I mentioned this earlier on the list and I
>thought that Jim's explanation was sufficient when he said that the
>persistent_id method is called for all objects including simple types
>like strings, ints, etc. I don't know if it explains all the calls that
>add up to a mean value of 20 though. I guess the calls are being made by
>the cPickle module, but I don't have the experience to investigate this.

The number of "persitent_id" calls suggests that a written
persistent object has a mean value of 65 subobjects -- which
fits well will OOBuckets.

However, when the profile is for commits with 100 insertions each,
then the number of written persistent objects is far too small.
In fact, we would expect about 200 persistent object writes per transaction:
the 100 new persistent objects assigned plus about as many buckets
changed by these insertions.

> 
>The keys that I lookup are completely random so it is probably the case
>that the lookup causes disk lookups all the time. If this is the case,
>is 230ms not still to slow?

Unreasonably slow in fact.

A tree with size 10**7 does likely not have a depth larger than 4
(internal nodes should typically have at least 125 entries, leaves should have
at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * 10**6).
Therefore, one would expect at most 4 disk accesses.

On my (6 year old) computer, a disk access can take up to 30 ms.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Roché Compaan

On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-1 21:17 +0200:
> >I have completed my first round of benchmarks on the ZODB and welcome
> >any criticism and advise. I summarised our earlier discussion and
> >additional findings in this blog entry:
> >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks
> 
> In your insertion test: when do you do commits?
> One per insertion? Or one per n insertions (for which "n")?

I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.

> Your profile looks very surprising:
> 
>   I would expect that for a single insertion, typically
>   one persistent object (the bucket where the insertion takes place)
>   is changed. About every 15 inserts, 3 objects are changed (the bucket
>   is split) about every 15*125 inserts, 5 objects are changed
>   (split of bucket and its container).
>   But the mean value of objects changed in a transaction is 20
>   in your profile.
>   The changed objects typically have about 65 subobjects. This
>   fits with "OOBucket"s.

It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.

> Lookup times:
> 
> 0.23 s would be 230 ms not 23 ms.

Oops my multiplier broke ;-)

> 
> The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
> BTree implementation itself. Lookup time is proportional to
> the tree depth, which ideally would be O(log(n)). While BTrees
> are not necessarily balanced (and therefore the depth may be larger
> than logarithmic) it is not easy to obtain a severely unbalanced
> tree by insertions only.
> Other factors must have contributed to this drop: swapping, cache too small,
> garbage collections...

The cache size was set to 10 objects so I doubt that this was the
cause. I do the lookup test right after I populate the BTree so it might
be that the cache and memory is full but I take care to commit after the
BTree is populated so even this is unlikely.

The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?

> Furthermore, the lookup times for your smaller BTrees are far too
> good -- fetching any object from disk takes in the order of several
> ms (2 to 20, depending on your disk).
> This means that the lookups for your smaller BTrees have
> typically been served directly from the cache (no disk lookups).
> With your large BTree disk lookups probably became necessary.

I accept that these lookups all all served from cache. I am going to
modify the lookup test so that I close the database after population and
re-open it when starting the test to make sure nothing is cached and see
what the results look like.

Thanks for your insightful comments!

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Dieter Maurer

Roché Compaan wrote at 2008-2-1 21:17 +0200:
>I have completed my first round of benchmarks on the ZODB and welcome
>any criticism and advise. I summarised our earlier discussion and
>additional findings in this blog entry:
>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks

In your insertion test: when do you do commits?
One per insertion? Or one per n insertions (for which "n")?


Your profile looks very surprising:

  I would expect that for a single insertion, typically
  one persistent object (the bucket where the insertion takes place)
  is changed. About every 15 inserts, 3 objects are changed (the bucket
  is split) about every 15*125 inserts, 5 objects are changed
  (split of bucket and its container).
  But the mean value of objects changed in a transaction is 20
  in your profile.
  The changed objects typically have about 65 subobjects. This
  fits with "OOBucket"s.


Lookup times:

0.23 s would be 230 ms not 23 ms.

The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
BTree implementation itself. Lookup time is proportional to
the tree depth, which ideally would be O(log(n)). While BTrees
are not necessarily balanced (and therefore the depth may be larger
than logarithmic) it is not easy to obtain a severely unbalanced
tree by insertions only.
Other factors must have contributed to this drop: swapping, cache too small,
garbage collections...

Furthermore, the lookup times for your smaller BTrees are far too
good -- fetching any object from disk takes in the order of several
ms (2 to 20, depending on your disk).
This means that the lookups for your smaller BTrees have
typically been served directly from the cache (no disk lookups).
With your large BTree disk lookups probably became necessary.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-01 Thread Roché Compaan

I have completed my first round of benchmarks on the ZODB and welcome
any criticism and advise. I summarised our earlier discussion and
additional findings in this blog entry:
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Jim Fulton



On Dec 7, 2007, at 11:19 AM, Godefroid Chapelle wrote:


Jim Fulton wrote:

On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote:

Jim Fulton wrote:
It sounds like I should write some pickle and cPickle tests and  
we should update the ZODB trunk to take advantage of this.  (/me  
fears gettimg mired in Python 3.)

Jim


Would you do that on Python 2.4, 2.5 or ... ?

I would do it on "?". ;)


I'd help with pleasure...

I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle  
hasn't been ported to Python 3 yet. :)

This is just a fairly simple test, so shouldn't be a big deal.


... but I fear it would take you longer to explain what to do than  
to do it.



Maybe. I bet all that's needed is to reuse the existing persistent_id  
tests using inst_persiistent_id.  I haven't looked at the tests yet.   
At which point, one might a test to distinguish the 2 cases.  And, of  
course, eventually, the tests will need to get checked in.  I can do  
that. You might be able to write the tests without much input from  
me.  OTOH, I'll get to them eventually.  :)


Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Godefroid Chapelle


Jim Fulton wrote:


On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote:


Jim Fulton wrote:
It sounds like I should write some pickle and cPickle tests and we 
should update the ZODB trunk to take advantage of this.  (/me fears 
gettimg mired in Python 3.)

Jim


Would you do that on Python 2.4, 2.5 or ... ?



I would do it on "?". ;)


I'd help with pleasure...



I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle 
hasn't been ported to Python 3 yet. :)


This is just a fairly simple test, so shouldn't be a big deal.



... but I fear it would take you longer to explain what to do than to do it.


Jim

--
Jim Fulton
Zope Corporation




--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Jim Fulton



On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote:


Jim Fulton wrote:
It sounds like I should write some pickle and cPickle tests and we  
should update the ZODB trunk to take advantage of this.  (/me fears  
gettimg mired in Python 3.)

Jim


Would you do that on Python 2.4, 2.5 or ... ?



I would do it on "?". ;)

I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle  
hasn't been ported to Python 3 yet. :)


This is just a fairly simple test, so shouldn't be a big deal.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Godefroid Chapelle


Jim Fulton wrote:
It sounds like I should write some pickle and cPickle tests and we 
should update the ZODB trunk to take advantage of this.  (/me fears 
gettimg mired in Python 3.)


Jim



Would you do that on Python 2.4, 2.5 or ... ?


--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Roché Compaan

[I'm resending this mail since it seems it never reached the mailing list?]

On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote:
> On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:
> 
> > Jim Fulton wrote:
> >> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
>  Despite this change there are still a huge amount
>  of unexplained calls to the 'persistent_id' method of the  
>  ObjectWriter
>  in serialize.py.
> >>>
> >>> Why 'unexplained'? 'persistent_id' is called from the Pickler  
> >>> instance
> >>> being used in ObjectWriter._dump(). It is called for each and every
> >>> single object reachable from the main object, due to the way Pickler
> >>> works (I believe). Maybe persistent_id can be analysed and optimized
> >>> for the most common cases?
> >> Yup.
> >> Note that there is a undocumented feature in cPickle that I added  
> >> years ago to deal with this issue but never got around to  
> >> pursuing.  Maybe someone else would be able to spend the time to  
> >> try it out and report back.
> >> If you set inst_persistent_id, rather than persistent_id, on a  
> >> pickler, then the hook will only be called for instances.  This  
> >> should eliminate that vast majority of the calls.
> >> Note that this feature was added back when testing was minimal or  
> >> non-existent, so it is untested, however, the implementation is  
> >> simple enough.  :)
> >
> > Do you mean that the ZODB has enough tests now that making the  
> > change and running the tests might already be a good proof ?
> 
> No, I mean that pickle and cPickle lack tests for this feature.
> 
> > Or should we be more prudent ?
> 
> It would be nice to try this out with ZODB to see if it makes much  
> difference.  If it does, then that would provide extra motivation for  
> me to add the missing test.
> 
> Roché Compaan said he would try it out, but I just realized that he  
> might have been waiting for me.

Sorry for not responding earlier. I actually tried this out immediately
after you suggested it and was very impressed with the improvement in
performance. I have been meaning to write back to give a thorough report
but a project with insane deadlines caught up with me. This project ends
this week so I plan to continue with my benchmark test next week and
give feedback thereafter.

The amount of calls to persistent_id dropped dramatically and in a test
of 10 million inserts the insert rate almost doubled from 1000 thousand
to 2000 inserts per second for at least the first million inserts. The
insert rate decreases rapidly thereafter until it drops to the insert
rate recorded before the persistent_id change. I guess at this point the
overhead of bucket splits are two high.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Jim Fulton

It sounds like I should write some pickle and cPickle tests and we  
should update the ZODB trunk to take advantage of this.  (/me fears  
gettimg mired in Python 3.)


Jim

On Dec 7, 2007, at 4:23 AM, Godefroid Chapelle wrote:


Godefroid Chapelle wrote:

Jim Fulton wrote:


On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:


Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the  
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance
being used in ObjectWriter._dump(). It is called for each and  
every
single object reachable from the main object, due to the way  
Pickler
works (I believe). Maybe persistent_id can be analysed and  
optimized

for the most common cases?

Yup.
Note that there is a undocumented feature in cPickle that I  
added years ago to deal with this issue but never got around to  
pursuing.  Maybe someone else would be able to spend the time to  
try it out and report back.
If you set inst_persistent_id, rather than persistent_id, on a  
pickler, then the hook will only be called for instances.  This  
should eliminate that vast majority of the calls.
Note that this feature was added back when testing was minimal  
or non-existent, so it is untested, however, the implementation  
is simple enough.  :)


Do you mean that the ZODB has enough tests now that making the  
change and running the tests might already be a good proof ?


No, I mean that pickle and cPickle lack tests for this feature.


Or should we be more prudent ?


It would be nice to try this out with ZODB to see if it makes much  
difference.  If it does, then that would provide extra motivation  
for me to add the missing test.


Roché Compaan said he would try it out, but I just realized that  
he might have been waiting for me.



Laurent (cced) tried it today and it seems it does make a difference.
Our benchmark is running this night with bigger amount of content.
We will be back with results tomorrow.


We can measure some benefit.

For tests on a ZODB prefilled with 100k instances of an archetypes  
class,


update of an instance : 12% improval
insert of an instance : 15% improval


If it would,


What do you mean by 'If it would' ?


If we can measure a benefit.

Jim


--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be


--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-07 Thread Godefroid Chapelle


Godefroid Chapelle wrote:

Jim Fulton wrote:


On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:


Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the 
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?

Yup.
Note that there is a undocumented feature in cPickle that I added 
years ago to deal with this issue but never got around to pursuing.  
Maybe someone else would be able to spend the time to try it out and 
report back.
If you set inst_persistent_id, rather than persistent_id, on a 
pickler, then the hook will only be called for instances.  This 
should eliminate that vast majority of the calls.
Note that this feature was added back when testing was minimal or 
non-existent, so it is untested, however, the implementation is 
simple enough.  :)


Do you mean that the ZODB has enough tests now that making the change 
and running the tests might already be a good proof ?


No, I mean that pickle and cPickle lack tests for this feature.


Or should we be more prudent ?


It would be nice to try this out with ZODB to see if it makes much 
difference.  If it does, then that would provide extra motivation for 
me to add the missing test.


Roché Compaan said he would try it out, but I just realized that he 
might have been waiting for me.




Laurent (cced) tried it today and it seems it does make a difference.

Our benchmark is running this night with bigger amount of content.

We will be back with results tomorrow.


We can measure some benefit.

For tests on a ZODB prefilled with 100k instances of an archetypes class,

update of an instance : 12% improval
insert of an instance : 15% improval


If it would,


What do you mean by 'If it would' ?


If we can measure a benefit.

Jim


--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Godefroid Chapelle


Jim Fulton wrote:


On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:


Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the ObjectWriter
in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?

Yup.
Note that there is a undocumented feature in cPickle that I added 
years ago to deal with this issue but never got around to pursuing.  
Maybe someone else would be able to spend the time to try it out and 
report back.
If you set inst_persistent_id, rather than persistent_id, on a 
pickler, then the hook will only be called for instances.  This 
should eliminate that vast majority of the calls.
Note that this feature was added back when testing was minimal or 
non-existent, so it is untested, however, the implementation is 
simple enough.  :)


Do you mean that the ZODB has enough tests now that making the change 
and running the tests might already be a good proof ?


No, I mean that pickle and cPickle lack tests for this feature.


Or should we be more prudent ?


It would be nice to try this out with ZODB to see if it makes much 
difference.  If it does, then that would provide extra motivation for me 
to add the missing test.


Roché Compaan said he would try it out, but I just realized that he 
might have been waiting for me.




Laurent (cced) tried it today and it seems it does make a difference.

Our benchmark is running this night with bigger amount of content.

We will be back with results tomorrow.




If it would,


What do you mean by 'If it would' ?


If we can measure a benefit.

Jim

--
Jim Fulton
Zope Corporation





--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Roché Compaan

On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote:
> On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:
> 
> > Jim Fulton wrote:
> >> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
>  Despite this change there are still a huge amount
>  of unexplained calls to the 'persistent_id' method of the  
>  ObjectWriter
>  in serialize.py.
> >>>
> >>> Why 'unexplained'? 'persistent_id' is called from the Pickler  
> >>> instance
> >>> being used in ObjectWriter._dump(). It is called for each and every
> >>> single object reachable from the main object, due to the way Pickler
> >>> works (I believe). Maybe persistent_id can be analysed and optimized
> >>> for the most common cases?
> >> Yup.
> >> Note that there is a undocumented feature in cPickle that I added  
> >> years ago to deal with this issue but never got around to  
> >> pursuing.  Maybe someone else would be able to spend the time to  
> >> try it out and report back.
> >> If you set inst_persistent_id, rather than persistent_id, on a  
> >> pickler, then the hook will only be called for instances.  This  
> >> should eliminate that vast majority of the calls.
> >> Note that this feature was added back when testing was minimal or  
> >> non-existent, so it is untested, however, the implementation is  
> >> simple enough.  :)
> >
> > Do you mean that the ZODB has enough tests now that making the  
> > change and running the tests might already be a good proof ?
> 
> No, I mean that pickle and cPickle lack tests for this feature.
> 
> > Or should we be more prudent ?
> 
> It would be nice to try this out with ZODB to see if it makes much  
> difference.  If it does, then that would provide extra motivation for  
> me to add the missing test.
> 
> Roché Compaan said he would try it out, but I just realized that he  
> might have been waiting for me.

Sorry for not responding earlier. I actually tried this out immediately
after you suggested it and was very impressed with the improvement in
performance. I have been meaning to write back to give a thorough report
but a project with insane deadlines caught up with me. This project ends
this week so I plan to continue with my benchmark test next week and
give feedback thereafter.

The amount of calls to persistent_id dropped dramatically and in a test
of 10 million inserts the insert rate almost doubled from 1000 thousand
to 2000 inserts per second for at least the first million inserts. The
insert rate decreases rapidly thereafter until it drops to the insert
rate recorded before the persistent_id change. I guess at this point the
overhead of bucket splits are two high.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Jim Fulton



On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:


Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the  
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance

being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?

Yup.
Note that there is a undocumented feature in cPickle that I added  
years ago to deal with this issue but never got around to  
pursuing.  Maybe someone else would be able to spend the time to  
try it out and report back.
If you set inst_persistent_id, rather than persistent_id, on a  
pickler, then the hook will only be called for instances.  This  
should eliminate that vast majority of the calls.
Note that this feature was added back when testing was minimal or  
non-existent, so it is untested, however, the implementation is  
simple enough.  :)


Do you mean that the ZODB has enough tests now that making the  
change and running the tests might already be a good proof ?


No, I mean that pickle and cPickle lack tests for this feature.


Or should we be more prudent ?


It would be nice to try this out with ZODB to see if it makes much  
difference.  If it does, then that would provide extra motivation for  
me to add the missing test.


Roché Compaan said he would try it out, but I just realized that he  
might have been waiting for me.




If it would,


What do you mean by 'If it would' ?


If we can measure a benefit.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Jim Fulton



On Nov 6, 2007, at 3:17 PM, Roché Compaan wrote:


On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:


Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the
ObjectWriter
in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance

being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?


Yup.

Note that there is a undocumented feature in cPickle that I added
years ago to deal with this issue but never got around to pursuing.
Maybe someone else would be able to spend the time to try it out and
report back.

If you set inst_persistent_id, rather than persistent_id, on a
pickler, then the hook will only be called for instances.  This
should eliminate that vast majority of the calls.


So is this as simple as modifying the following code in the
ObjectWriter:

self._p.persistent_id = self.persistent_id

to:

self._p.inst_persistent_id = self.persistent_id


Yes.


I'll give it a go as part of my benchmarks that I'm running and report
back.



I hope you weren't waiting for my answer.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Godefroid Chapelle


Jim Fulton wrote:


On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:


Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the ObjectWriter
in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?


Yup.

Note that there is a undocumented feature in cPickle that I added years 
ago to deal with this issue but never got around to pursuing.  Maybe 
someone else would be able to spend the time to try it out and report back.


If you set inst_persistent_id, rather than persistent_id, on a pickler, 
then the hook will only be called for instances.  This should eliminate 
that vast majority of the calls.


Note that this feature was added back when testing was minimal or 
non-existent, so it is untested, however, the implementation is simple 
enough.  :) 


Do you mean that the ZODB has enough tests now that making the change 
and running the tests might already be a good proof ?


Or should we be more prudent ?


If it would,


What do you mean by 'If it would' ?

then of course we should contribute 
documentation and a test to the Python source tree.


Jim

--
Jim Fulton
Zope Corporation



--
Godefroid Chapelle (aka __gotcha) http://bubblenet.be
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-07 Thread David Binger



On Nov 6, 2007, at 2:30 PM, Roché Compaan wrote:


If you can cut the b-node branching factor in half, I bet your
benchmark will run almost twice as fast.


I increased the 'DEFAULT_MAX_BUCKET_SIZE' from 30 to 3 and
DEFAULT_MAX_BTREE_SIZE from 250 to 2500 and it didn't make any
noticeable difference. Despite this change there are still a huge  
amount

of unexplained calls to the 'persistent_id' method of the ObjectWriter
in serialize.py.


I can see here that what I wrote was not clear.  I intended
to say that the *smaller* bucket size should be faster for this because
the size of each committed transaction would be smaller.







___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Marius Gedminas

On Tue, Nov 06, 2007 at 10:01:24PM +0200, Roché Compaan wrote:
> On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:
> > > Despite this change there are still a huge amount
> > > of unexplained calls to the 'persistent_id' method of the ObjectWriter
> > > in serialize.py.
> > 
> > Why 'unexplained'? 'persistent_id' is called from the Pickler instance
> > being used in ObjectWriter._dump(). It is called for each and every
> > single object reachable from the main object, due to the way Pickler
> > works (I believe). Maybe persistent_id can be analysed and optimized
> > for the most common cases?
> > 
> 
> If you look at the profiler stats I posted earlier you would have
> noticed that there was about 1.3 million calls to persistent_id while
> only 2 objects were persisted. So if it is being called for each
> object I would expect a figure closer to 2, not 1.3 million. What am
> I missing?

AFAIU persisted_id() is called once for every reference to a persisten
object rather than once for every persistent object.  If there were 65
references to each of the 2 objects you'd get 1.3 million calls to
persistent_id().

Marius Gedminas
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet?


signature.asc
Description: Digital signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan

On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote:
> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
> 
> >> Despite this change there are still a huge amount
> >> of unexplained calls to the 'persistent_id' method of the  
> >> ObjectWriter
> >> in serialize.py.
> >
> > Why 'unexplained'? 'persistent_id' is called from the Pickler instance
> > being used in ObjectWriter._dump(). It is called for each and every
> > single object reachable from the main object, due to the way Pickler
> > works (I believe). Maybe persistent_id can be analysed and optimized
> > for the most common cases?
> 
> Yup.
> 
> Note that there is a undocumented feature in cPickle that I added  
> years ago to deal with this issue but never got around to pursuing.   
> Maybe someone else would be able to spend the time to try it out and  
> report back.
> 
> If you set inst_persistent_id, rather than persistent_id, on a  
> pickler, then the hook will only be called for instances.  This  
> should eliminate that vast majority of the calls.

So is this as simple as modifying the following code in the
ObjectWriter:

self._p.persistent_id = self.persistent_id

to:

self._p.inst_persistent_id = self.persistent_id


I'll give it a go as part of my benchmarks that I'm running and report
back.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan

On Tue, 2007-11-06 at 15:08 -0500, Jim Fulton wrote:
> On Nov 6, 2007, at 3:01 PM, Roché Compaan wrote:
> 
> > On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:
> >>> Despite this change there are still a huge amount
> >>> of unexplained calls to the 'persistent_id' method of the  
> >>> ObjectWriter
> >>> in serialize.py.
> >>
> >> Why 'unexplained'? 'persistent_id' is called from the Pickler  
> >> instance
> >> being used in ObjectWriter._dump(). It is called for each and every
> >> single object reachable from the main object, due to the way Pickler
> >> works (I believe). Maybe persistent_id can be analysed and optimized
> >> for the most common cases?
> >>
> >
> > If you look at the profiler stats I posted earlier you would have
> > noticed that there was about 1.3 million calls to persistent_id while
> > only 2 objects were persisted. So if it is being called for each
> > object I would expect a figure closer to 2, not 1.3 million.  
> > What am
> > I missing?
> 
> It's called for *all* objects, not just persistent objects. This  
> includes, ints, strings (including attribute names), etc.

Ah. Man that lightbulb is burning my brain ;-)

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Jim Fulton



On Nov 6, 2007, at 3:01 PM, Roché Compaan wrote:


On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the  
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance

being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?



If you look at the profiler stats I posted earlier you would have
noticed that there was about 1.3 million calls to persistent_id while
only 2 objects were persisted. So if it is being called for each
object I would expect a figure closer to 2, not 1.3 million.  
What am

I missing?


It's called for *all* objects, not just persistent objects. This  
includes, ints, strings (including attribute names), etc.


Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan

On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:
> > Despite this change there are still a huge amount
> > of unexplained calls to the 'persistent_id' method of the ObjectWriter
> > in serialize.py.
> 
> Why 'unexplained'? 'persistent_id' is called from the Pickler instance
> being used in ObjectWriter._dump(). It is called for each and every
> single object reachable from the main object, due to the way Pickler
> works (I believe). Maybe persistent_id can be analysed and optimized
> for the most common cases?
> 

If you look at the profiler stats I posted earlier you would have
noticed that there was about 1.3 million calls to persistent_id while
only 2 objects were persisted. So if it is being called for each
object I would expect a figure closer to 2, not 1.3 million. What am
I missing?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Jim Fulton



On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:


Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the  
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?


Yup.

Note that there is a undocumented feature in cPickle that I added  
years ago to deal with this issue but never got around to pursuing.   
Maybe someone else would be able to spend the time to try it out and  
report back.


If you set inst_persistent_id, rather than persistent_id, on a  
pickler, then the hook will only be called for instances.  This  
should eliminate that vast majority of the calls.


Note that this feature was added back when testing was minimal or non- 
existent, so it is untested, however, the implementation is simple  
enough.  :)  If it would, then of course we should contribute  
documentation and a test to the Python source tree.


Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Sidnei da Silva

> Despite this change there are still a huge amount
> of unexplained calls to the 'persistent_id' method of the ObjectWriter
> in serialize.py.

Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?

-- 
Sidnei da Silva
Enfold Systemshttp://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan

On Wed, 2007-10-31 at 10:47 -0400, David Binger wrote:
> On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote:
> 
> > Thanks for the explanation.
> 
> The actual insertion is very fast.  Your benchmark is dominated by
> the time to serialize the changes due to an insertion.
> 
> You should usually have just 2 instances to serialize per insertion of
> a new instance:  the instance itself and the b-node that points to the
> instance.  An insertion may also cause changes in 2 or several b-nodes,
> but those cases are less likely.
> 
> Serializing your simple instances is probably fast, but serializing
> the b-nodes appears to be taking much more time, and probably accounts
> for the large number of calls to persistent_id.  B-Nodes with higher
> branching factors will have more parts to serialize and they will be
> slower.  If you can cut the b-node branching factor in half, I bet your
> benchmark will run almost twice as fast.

I increased the 'DEFAULT_MAX_BUCKET_SIZE' from 30 to 3 and
DEFAULT_MAX_BTREE_SIZE from 250 to 2500 and it didn't make any
noticeable difference. Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the ObjectWriter
in serialize.py.

Running fsdump on the Data.fs confirmed that there are only 2 instances
serialized per insertion, one OOBucket and an instance of the persistent
class used for testing.

So thus far the only tweak that made a significant difference was
increasing the DB cache size from 400 to 10. 

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Benji York


Laurence Rowe wrote:
Essentially you end up with a solution very similar to QueueCatalog but 
with the queue being searchable.


The pain is then in modifying all of the indexes to search the queue in 
addition to their standard data structures.


In many applications it is acceptable to have a catalog (or other data 
structure) that is slightly out of date.  In those, you can ignore the 
queued data.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Roché Compaan

On Fri, 2007-11-02 at 16:00 +, Laurence Rowe wrote:
> Matt Hamilton wrote:
> > David Binger  mems-exchange.org> writes:
> > 
> >>
> >> On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:
> >>
> >>> Lots of people don't do nightly packs, I'm pretty sure such a process
> >>> needs to be completely automatic. The question is weather doing it in
> >>> a separate process in the background, or ever X transactions, or every
> >>> X seconds, or something.
> >> Okay, perhaps the trigger should be the depth of the small-bucket tree.
> > 
> > That may just end up causing delays periodically in transactions... ie 
> > delays
> > that the user sees, as opposed to doing it via another thread or something. 
> >  But
> > then as only one thread would be doing this at a time it might not be too 
> > bad.
> > 
> > -Matt
> 
> ClockServer sections can now be specified in zope.conf. If you specify 
> them with a period of say 10 mins (or even 2) then the queue should 
> never get too large, and the linear search time is not a problem as n is 
> small.
> 
> Essentially you end up with a solution very similar to QueueCatalog but 
> with the queue being searchable.
> 
> The pain is then in modifying all of the indexes to search the queue in 
> addition to their standard data structures.

I don't think that you need to modify the indexes at all. You simply
pass the query arguments to the queue in the exact same way that you
apply the arguments to individual indexes.

I think with a little enhancement to QueueCatalog one should be able to
pull this off.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Laurence Rowe


Matt Hamilton wrote:

David Binger  mems-exchange.org> writes:



On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:


Lots of people don't do nightly packs, I'm pretty sure such a process
needs to be completely automatic. The question is weather doing it in
a separate process in the background, or ever X transactions, or every
X seconds, or something.

Okay, perhaps the trigger should be the depth of the small-bucket tree.


That may just end up causing delays periodically in transactions... ie delays
that the user sees, as opposed to doing it via another thread or something.  But
then as only one thread would be doing this at a time it might not be too bad.

-Matt


ClockServer sections can now be specified in zope.conf. If you specify 
them with a period of say 10 mins (or even 2) then the queue should 
never get too large, and the linear search time is not a problem as n is 
small.


Essentially you end up with a solution very similar to QueueCatalog but 
with the queue being searchable.


The pain is then in modifying all of the indexes to search the queue in 
addition to their standard data structures.


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Russ Ferriday

This is the 'batch' or 'distribute' pattern that crops up in many  
fields.


The best path is normally to understand what the conflicts are, and  
where the time is spent.
If in, this case, much time is spent in the preamble, and the actual  
inserts are quick, then diving down one time through the security  
layers and stuffing in 10 items is clearly better than 10 preambles,  
one for each insert.


The other truism is that all optimisation is for a single case.   
There may be different answers for different cases. Ideally a single  
parameter would be enough to tune the system for different cases.


Good luck Roche, with the outcome, I'm excited to see some figures.

--r.


On 2 Nov 2007, at 15:24, David Binger wrote:



On Nov 2, 2007, at 10:58 AM, Lennart Regebro wrote:


It seems to me having one thread doing a background consolidation one
transaction at a time seems a better way to go,


Maybe, but maybe that just causes big buckets to get invalidated
in all of the clients over and over again, when we could accomplish
the same objective in one invalidation by waiting longer and executing
a bigger consolidation.


although certainly the
best thing would be to test all kinds of solutions and see.


No doubt about that.


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Russ Ferriday
[EMAIL PROTECTED]
office: +44 118 3217026
mobile: +44 7789 338868
skype: ferriday



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger



On Nov 2, 2007, at 10:58 AM, Lennart Regebro wrote:


It seems to me having one thread doing a background consolidation one
transaction at a time seems a better way to go,


Maybe, but maybe that just causes big buckets to get invalidated
in all of the clients over and over again, when we could accomplish
the same objective in one invalidation by waiting longer and executing
a bigger consolidation.


although certainly the
best thing would be to test all kinds of solutions and see.


No doubt about that.


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Lennart Regebro

On 11/2/07, David Binger <[EMAIL PROTECTED]> wrote:
> > But wouldn't then all other threads get a conflict?
>
> If they are trying to do insertions at the same time as the
> consolidation, yes.
> This data structure won't stop insertion conflicts, the intent is to
> make them
> less frequent.

But still, that does mean that in practice all writing threads will
stand still during consolidation, because if they do anything they
will get a conflict. And this whole issue only arises if you have
loads of conflicts, almost all the time, because you have many writes.

It seems to me having one thread doing a background consolidation one
transaction at a time seems a better way to go, although certainly the
best thing would be to test all kinds of solutions and see.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger



On Nov 2, 2007, at 10:18 AM, Christian Theune wrote:


Wouldn't a queue be a good data structure to do that? IIRC ZC already
wrote a queue that doesn't conflict:

http://svn.zope.de/zope.org/zc.queue/trunk/src/zc/queue/queue.txt

If you store key/value pairs in the queue, you can do a step-by-step
migration from the queue to the btree.


I guess that was the original proposal mentioned by Matt.

The bad thing about a Queue for this purpose is that searches are  
linear time,
so you can't wait as long before consolidating.  That might be okay,  
though, if you

intend to run consolidations continuously.



Probably this should be encapsulated into a new data structure that
looks btree-like and has an additional `consolidate` method.


That sounds proper.


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Christian Theune

Hi,

Am Freitag, den 02.11.2007, 09:56 -0400 schrieb David Binger:
> On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote:
> 
> > On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote:
> >> That may just end up causing delays periodically in  
> >> transactions... ie delays
> >> that the user sees, as opposed to doing it via another thread or  
> >> something.  But
> >> then as only one thread would be doing this at a time it might not  
> >> be too bad.
> >
> > But wouldn't then all other threads get a conflict?
> 
> If they are trying to do insertions at the same time as the  
> consolidation, yes.
> This data structure won't stop insertion conflicts, the intent is to  
> make them
> less frequent.

Hmm.

Wouldn't a queue be a good data structure to do that? IIRC ZC already
wrote a queue that doesn't conflict:

http://svn.zope.de/zope.org/zc.queue/trunk/src/zc/queue/queue.txt

If you store key/value pairs in the queue, you can do a step-by-step
migration from the queue to the btree.

Probably this should be encapsulated into a new data structure that
looks btree-like and has an additional `consolidate` method.

Calling the `consolidate` method would have to happen from the
application that uses this data structure. Two issues I can think of
immediately:

- General: We need an efficient way to find all data structures that
need reconciliation, maybe a ZODB-wide index of all objects that require
reconciliation would be nice. 

- With Zope and ZEO: which Zope server is responsible for actually
performing the reconciliation? One of the Zope servers that is marked in
the zope.conf? Or maybe the ZEO server?

Christian

-- 
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger



On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote:


On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote:
That may just end up causing delays periodically in  
transactions... ie delays
that the user sees, as opposed to doing it via another thread or  
something.  But
then as only one thread would be doing this at a time it might not  
be too bad.


But wouldn't then all other threads get a conflict?


If they are trying to do insertions at the same time as the  
consolidation, yes.
This data structure won't stop insertion conflicts, the intent is to  
make them

less frequent.











___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Lennart Regebro

On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote:
> That may just end up causing delays periodically in transactions... ie delays
> that the user sees, as opposed to doing it via another thread or something.  
> But
> then as only one thread would be doing this at a time it might not be too bad.

But wouldn't then all other threads get a conflict?
-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger



On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:


Lots of people don't do nightly packs, I'm pretty sure such a process
needs to be completely automatic. The question is weather doing it in
a separate process in the background, or ever X transactions, or every
X seconds, or something.


Okay, perhaps the trigger should be the depth of the small-bucket tree.



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Matt Hamilton

David Binger  mems-exchange.org> writes:

> 
> 
> On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:
> 
> > Lots of people don't do nightly packs, I'm pretty sure such a process
> > needs to be completely automatic. The question is weather doing it in
> > a separate process in the background, or ever X transactions, or every
> > X seconds, or something.
> 
> Okay, perhaps the trigger should be the depth of the small-bucket tree.

That may just end up causing delays periodically in transactions... ie delays
that the user sees, as opposed to doing it via another thread or something.  But
then as only one thread would be doing this at a time it might not be too bad.

-Matt

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Lennart Regebro

On 11/2/07, David Binger <[EMAIL PROTECTED]> wrote:
> I think that option would work.  I think it would suffice to do a
> "Big.update(Small); Small.clear()" operation before a nightly pack.

Lots of people don't do nightly packs, I'm pretty sure such a process
needs to be completely automatic. The question is weather doing it in
a separate process in the background, or ever X transactions, or every
X seconds, or something.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger



On Nov 2, 2007, at 5:48 AM, Lennart Regebro wrote:


On 11/1/07, Matt Hamilton <[EMAIL PROTECTED]> wrote:
An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


Would it be possible to not "occasionally" consolidate, but actually
do it ongoing, but just one process, thereby always inserting just one
transaction into the normal BTree at a time? Or does that cause
troubles?


I think that option would work.  I think it would suffice to do a
"Big.update(Small); Small.clear()" operation before a nightly pack.
It might invalidate every bucket in every cache, but BTrees are
designed to perform reasonably without a cache.







___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Lennart Regebro

On 11/1/07, Matt Hamilton <[EMAIL PROTECTED]> wrote:
> An interesting idea.  Surely we need the opposite though, and that is an
> additional BTree with a very large bucket size, as we want to minimize the
> chance of a bucket split when inserting?  Then we occasionally consolidate and
> move the items in the original BTree with the regular bucket size/branch 
> factor.

Would it be possible to not "occasionally" consolidate, but actually
do it ongoing, but just one process, thereby always inserting just one
transaction into the normal BTree at a time? Or does that cause
troubles?

//In way over my head.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Russ Ferriday


Quick note...

Smaller buckets, fewer conflicts, more overhead on reading and writing.
Larger buckets, more conflicts, less overhead on reading and writing.
One bucket ... constant conflicts.

I'd bet that the additional tree with tiny buckets would be best.  
Transfer them into the normal tree once the overhead starts rising.


How to transfer them?  You would not want a single transaction to  
take the hit for the whole transfer, so have a low and high water  
mark. When hitting HWM, transfer only until LWM is reached. Or, just  
focus on transferring some items out of a single tree, to avoid the  
cost of tree rebalancing on the additional tree at the same time as  
rebalancing on the main tree.


Sounds like a fin project,

--r.

On 1 Nov 2007, at 21:00, David Binger wrote:



On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger  mems-exchange.org> writes:




On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


You may be right about that.  Conflict resolution makes it harder for
me to predict which way is better.  If you don't have conflict  
resolution

for insertions, then I think the smaller buckets are definitely better
for avoiding conflicts.  In either case, smaller buckets reduce the
size and serialization time of the insertion transactions, and that  
alone

*might* be a reason to favor them.  I think I'd still bet on smaller
buckets, but tests would expose the trade-offs.








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Russ Ferriday
[EMAIL PROTECTED]
office: +44 118 3217026
mobile: +44 7789 338868
skype: ferriday



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread David Binger



On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger  mems-exchange.org> writes:




On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


You may be right about that.  Conflict resolution makes it harder for
me to predict which way is better.  If you don't have conflict  
resolution

for insertions, then I think the smaller buckets are definitely better
for avoiding conflicts.  In either case, smaller buckets reduce the
size and serialization time of the insertion transactions, and that  
alone

*might* be a reason to favor them.  I think I'd still bet on smaller
buckets, but tests would expose the trade-offs.








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Gary Poster



On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger  mems-exchange.org> writes:


On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


maybe.  haven't thought it through, but worth thinking about.

idle thought I should probably not share:

you could use a Bucket directly for that--it will never split at all,  
and has the conflict resolution behavior.


(strangely, I'm not idle at all, but rather overwhelmingly busy ;-) )

Gary

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Matt Hamilton

David Binger  mems-exchange.org> writes:

> 
> 
> On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:
> 
> > Ie we perhaps look at a catalog data structure
> > in which writes are initially done to some kind of queue then moved  
> > to the
> > BTrees at a later point.
> 
> A suggestion: use a pair of BTrees, one with a high branching factor  
> (bucket size)
> and one with a very low branching factor.  Force all writes into the  
> tree with little
> buckets.  Make every search look in both trees.  Consolidate  
> occasionally.

An interesting idea.  Surely we need the opposite though, and that is an
additional BTree with a very large bucket size, as we want to minimize the
chance of a bucket split when inserting?  Then we occasionally consolidate and
move the items in the original BTree with the regular bucket size/branch factor.

-Matt

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Roché Compaan

On Wed, 2007-10-31 at 10:47 -0400, David Binger wrote:
> On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote:
> 
> > Thanks for the explanation.
> 
> The actual insertion is very fast.  Your benchmark is dominated by
> the time to serialize the changes due to an insertion.
> 
> You should usually have just 2 instances to serialize per insertion of
> a new instance:  the instance itself and the b-node that points to the
> instance.  An insertion may also cause changes in 2 or several b-nodes,
> but those cases are less likely.
> 
> Serializing your simple instances is probably fast, but serializing
> the b-nodes appears to be taking much more time, and probably accounts
> for the large number of calls to persistent_id.  B-Nodes with higher
> branching factors will have more parts to serialize and they will be
> slower.  If you can cut the b-node branching factor in half, I bet your
> benchmark will run almost twice as fast.


I guess you are referring to the B-Tree bucket size? This is not really
configurable and one will have to recompile the C code once you modify
it. For OOBTree the max bucket size is 30.  I'll see what effect it has
on the test nevertheless.


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread David Binger



On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved  
to the

BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor  
(bucket size)
and one with a very low branching factor.  Force all writes into the  
tree with little
buckets.  Make every search look in both trees.  Consolidate  
occasionally.


It seems like this would be a good way to push the conflict rate down  
while

still providing fast item access when the cache is cold.
The little bucket tree seems better than an ordinary queue for this  
purpose
because it can be searched faster, and you can let it grow as much as  
you
like between consolidations.  Also the size of the transaction on  
each insert will

be pretty small.






___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Matt Hamilton

Laurence Rowe  lrowe.co.uk> writes:

> So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
> inserts. Instead of inserting into the (B-Tree based) data files at 
> every transaction commit it writes a record to the WAL. This does not 
> require traversal of the B-Tree and has O(1) time complexity. The 
> penalty for this is that read operations become more complex, they must 
> look first in the WAL and overlay those results with the main index. The 
> WAL is never allowed to get too large, or its in memory index would 
> become too big.

This is sort of what I proposed at the performance BOF at the Plone Conf
specifically for the ZCatalog.  Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved to the
BTrees at a later point.

One thing to be vary wary of with all of this.  As has been shown the conflict
errors are relative to the size of the Btree, this is due to the probability of
a bucket needing to be split.  We need to be very sure that real life use cases
are what we think they are.  Ie. in a large running site some of the BTrees will
already be quite large and so conflicts might not be such an issue (they are an
issue so something is not right here).  Or we might find for instance that one
of the catalog indexes, eg. something like a FieldIndex might only have a small
vocabulary (e.g. the author of a piece of content) but referenced by every
document.  In that case you would have an index with very few keys and N values
(where N is the number of documents) and an unindex of N keys each with a a very
small number of values, hence a small btree, hence large chance of collisions.

-Matt

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread David Binger



On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote:


Thanks for the explanation.


The actual insertion is very fast.  Your benchmark is dominated by
the time to serialize the changes due to an insertion.

You should usually have just 2 instances to serialize per insertion of
a new instance:  the instance itself and the b-node that points to the
instance.  An insertion may also cause changes in 2 or several b-nodes,
but those cases are less likely.

Serializing your simple instances is probably fast, but serializing
the b-nodes appears to be taking much more time, and probably accounts
for the large number of calls to persistent_id.  B-Nodes with higher
branching factors will have more parts to serialize and they will be
slower.  If you can cut the b-node branching factor in half, I bet your
benchmark will run almost twice as fast.

I think the default branching factor is large in ZODB because that
can be good for fast reading.  It is bad if you want fast writing.
If you want fast writing, use a small branching factor.







___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Roché Compaan

On Wed, 2007-10-31 at 10:00 +, Laurence Rowe wrote:
> It looks like ZODB performance in your test has the same O(log n) 
> performance as PostgreSQL checkpoints (the periodic drops in your 
> graph). This should come as no surprise. B-Trees have a theoretical 
> Search/Insert/Delete time complexity equal to the height of the tree, 
> which is (up to) log(n).
> 
> So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
> inserts. Instead of inserting into the (B-Tree based) data files at 
> every transaction commit it writes a record to the WAL. This does not 
> require traversal of the B-Tree and has O(1) time complexity. The 
> penalty for this is that read operations become more complex, they must 
> look first in the WAL and overlay those results with the main index. The 
> WAL is never allowed to get too large, or its in memory index would 
> become too big.

Thanks for the explanation. After some profiling I noticed that there
are millions of OID lookups in the index. Increasing the cache size from
400 to 10 led to more acceptable levels of performance degradation.
I'll post some results later on. Some profiling also showed that there
are huge amount of calls to the persistent_id method of the ObjectWriter
- persisting 1 objects leads to 1338046 calls to persistent_id. This
seems to have quite a bit of overhead. Profile results attached.

> If you are going to have this number of records -- in a single B-Tree -- 
> then use a relational database. It's what they're optimised for.

The point of the benchmark is to determine what "this number of records"
means and to deduce best practice when working with the ZODB. I would
much rather tell a developer to use multiple B-Trees if he wants to
store this number of records than tell them to use a relational
database. Telling a ZODB programmer to use a relational database is an
insult ;-)

One of the tests that I want to try out next is to insert records
concurrently into different B-Trees.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za
Tue Oct 30 20:28:04 2007/tmp/profile-1.dat

 6108977 function calls (6108973 primitive calls) in 57.280 CPU seconds

   Ordered by: cumulative time
   List reduced from 232 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.000   57.280   57.280 profile_zodb.py:70(run)
10.0000.000   57.280   57.280 :1(?)
10.2600.260   57.280   57.280 profile_zodb.py:24(_btrees_insert)
10.0000.000   57.280   57.280 profile:0(run())
 10010.0300.000   51.0600.051 _manager.py:88(commit)
 10010.0400.000   50.9900.051 _transaction.py:365(commit)
 10010.1100.000   50.7300.051 
_transaction.py:486(_commitResources)
 10010.0200.000   48.0600.048 Connection.py:496(commit)
 10010.2200.000   48.0400.048 Connection.py:512(_commit)
 98890.9400.000   47.3400.005 Connection.py:561(_store_objects)
203720.4800.000   39.7900.002 serialize.py:381(serialize)
203720.5000.000   38.9500.002 serialize.py:409(_dump)
407507.7900.000   38.0200.001 :0(dump)
  1338046   17.5600.000   30.2300.000 serialize.py:184(persistent_id)
  21772239.1500.0009.1500.000 :0(isinstance)
203731.5500.0005.2400.000 FileStorage.py:631(store)
 29640.0500.0004.9800.002 Connection.py:749(setstate)
 29640.1000.0004.9300.002 Connection.py:769(_setstate)
 29640.0800.0004.1800.001 serialize.py:603(setGhostState)
 29640.0300.0004.1000.001 serialize.py:593(getState)
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Sidnei da Silva

I think someone proposed to have something just like a WAL in ZODB.
That could be an interesting optimization.

-- 
Sidnei da Silva
Enfold Systemshttp://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Laurence Rowe

It looks like ZODB performance in your test has the same O(log n) 
performance as PostgreSQL checkpoints (the periodic drops in your 
graph). This should come as no surprise. B-Trees have a theoretical 
Search/Insert/Delete time complexity equal to the height of the tree, 
which is (up to) log(n).


So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
inserts. Instead of inserting into the (B-Tree based) data files at 
every transaction commit it writes a record to the WAL. This does not 
require traversal of the B-Tree and has O(1) time complexity. The 
penalty for this is that read operations become more complex, they must 
look first in the WAL and overlay those results with the main index. The 
WAL is never allowed to get too large, or its in memory index would 
become too big.


If you are going to have this number of records -- in a single B-Tree -- 
then use a relational database. It's what they're optimised for.


Laurence

Roché Compaan wrote:

Well I finally realised that ZODB benchmarks are not going to fall from
the sky so compelled by a project that needs to scale to very large
numbers and a general desire to have real numbers I started to write
some benchmarks.

My first goal was to get a baseline and test performance for the most
basic operations like inserts and lookups. The first test tests BTree
performance (OOBTree to be specific) and insert instances of a persitent
class into a BTree. Each instance has a single attribute that is 1K in
size. The test tries out different commit intervals - the first
iteration commits every 10 inserts, the second iteration commits every
100 inserts and the last one commits every 1000 inserts. I don't have
results for the second and third iterations since the first iteration
takes a couple of hours to complete and I'm still waiting for the
results on the second and third iteration.

The results so far is worrying in that performance deteriorates
logarithmically. The test kicks of with a bang at close to 750 inserts
per second, but after 1 million objects the insert rate drops to 260
inserts per second and at 10 million objects the rate is not even 60
inserts per second. Why?

In an attempt to determine if this drop in performance is normal I
created a test with Postgres purely to observe transaction rate and not
to compare it with the ZODB. In Postgres the transaction rate hovers
around 2700 inserts throughout the test. There are periodic drops but I
guess these are times when Postgres flushes to disc. I was hoping to
have a consistent transaction rate in the ZODB too. See the attached
image for the comparison. I also attach csv files of the data collected
by both tests.

During the last Plone conference I started a project called zodbbench
available here:

https://svn.plone.org/svn/collective/collective.zodbbench

The tests are written as unit tests and are run with a testrunner
script. The project uses buildout to make it easy to get going.
Unfortunately installing it with buildout on some systems seems to lead
to weird import errors that I can't explain so I would appreciate it if
somebody with buildout fu can look at it. 


What I would appreciate more though is an explanation of the drop in
performance or alternatively, why the test is insane ;-)








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

89 matches

Mail list logo