Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Christian Theune
Hi,

On Fri, Mar 21, 2008 at 09:08:28PM +0100, Dieter Maurer wrote:
 Chris Withers wrote at 2008-3-20 22:22 +:
 Roché Compaan wrote:
  Not yet, they are very time consuming. I plan to do the same tests over
  ZEO next to determine what overhead ZEO introduces.
 
 Remember to try introducing more app servers and see where the 
 bottleneck comes ;-)
 
 We have seen commit contention with lots (24) of zeo clients
 and a high write rate application (allmost all requests write to
 the ZODB).

I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
should look into a technique called `group commit` to get rid of the commit
contention.

Does anybody know this technique already and maybe has a pointer for me?

Christian

-- 
gocept gmbh  co. kg - forsterstrasse 29 - 06112 halle (saale) - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Benji York

Christian Theune wrote:

I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
should look into a technique called `group commit` to get rid of the commit
contention.

Does anybody know this technique already and maybe has a pointer for me?


I'd never heard the phrase until reading your message, but I think I got 
a pretty clear picture from 
http://forums.mysql.com/read.php?22,53854,53854#msg-53854 and 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01696.php.


Summary: fsync is slow (and the cornerstone of most commit steps), so 
try to gather up a small batch of commits to do all at once (with only 
one call to fsync).  Somewhat like Nagle's algorithm, but for fsync.


The kicker is that OSs and hardware often lie about fsync (and it's 
therefore fast) and good hardware (disk arrays with battery backed write 
cache) already make fsync pretty fast.


Not to suggest that group commit wouldn't speed things up, but it would 
seem that the technique will make the largest improvement for people 
that are using a non-lying fsync on inappropriate hardware.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Dieter Maurer
Benji York wrote at 2008-3-25 09:40 -0400:
Christian Theune wrote:
 I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we
 should look into a technique called `group commit` to get rid of the commit
 contention.
 ...
Summary: fsync is slow (and the cornerstone of most commit steps), so 
try to gather up a small batch of commits to do all at once (with only 
one call to fsync).

Our commit contention definitely is not caused by fsync.
Our fsync is quite fast. If only fsync would need to be considered,
we could easily process at least 1.000 transactions per second -- but
actually with 10 transactions per second we get contentions a few times
per week.



We do not yet precisely the cause of our commit contentions.
Almost surely there are several causes that all can lead to contention.

We already found:

  *  client side causes (while the client helds to commit lock)
  
- garbage collections (which can block a client in the order of
  10 to 20 s)

- NFS operations (which can take up to 27 s in our setup -- for
  still unknown reasons)

- invalidation processing, espicially ZEO ClientCache processing

  *  server side causes

- commit lock hold during copy phase of pack

- IO trashing during the reachability analysis in pack

- non deterministic server side IO anomalities
  (IO suddently takes several times longer than usual -- for still
  unknown reasons)
 Somewhat like Nagle's algorithm, but for fsync.

The kicker is that OSs and hardware often lie about fsync (and it's 
therefore fast) and good hardware (disk arrays with battery backed write 
cache) already make fsync pretty fast.

Not to suggest that group commit wouldn't speed things up, but it would 
seem that the technique will make the largest improvement for people 
that are using a non-lying fsync on inappropriate hardware.
-- 
Benji York
Senior Software Engineer
Zope Corporation

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Benji York

Dieter Maurer wrote:

We do not yet precisely the cause of our commit contentions.


Hard to tell what'll make them better then. ;)


Almost surely there are several causes that all can lead to contention.

We already found:

  *  client side causes (while the client helds to commit lock)
  
- garbage collections (which can block a client in the order of

  10 to 20 s)


Interesting.  Perhaps someone might enjoy investigating turning off 
garbage collection during commits.



- NFS operations (which can take up to 27 s in our setup -- for
  still unknown reasons)


Not much ZODB can do about that. ;)


- invalidation processing, espicially ZEO ClientCache processing


Interesting.  Not knowing much about how invalidations are handled, I'm 
curious where the slow-down is.  Do you have any more detail?



  *  server side causes

- commit lock hold during copy phase of pack

- IO trashing during the reachability analysis in pack


The new pack code should help quite a bit with the above (if you're 
saying what I think you're saying).



- non deterministic server side IO anomalities
  (IO suddently takes several times longer than usual -- for still
  unknown reasons)


Curious.
--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-25 Thread Dieter Maurer
Benji York wrote at 2008-3-25 14:24 -0400:
 ... commit contentions ...
 Almost surely there are several causes that all can lead to contention.
 
 We already found:
 
   *  client side causes (while the client helds to commit lock)
   
 - garbage collections (which can block a client in the order of
   10 to 20 s)

Interesting.  Perhaps someone might enjoy investigating turning off 
garbage collection during commits.

A reconfiguration of the garbage collector helped us with this one
(the standard configuration is not well tuned to processes with
large amounts of objects).

 
 - invalidation processing, espicially ZEO ClientCache processing

Interesting.  Not knowing much about how invalidations are handled, I'm 
curious where the slow-down is.  Do you have any more detail?

Not many:

We have a component called RequestMonitor which periodically
checks for long running requests and logs the corresponding stack
traces.
This monitor very often sees requests (holding the commit lock)
which are in ZEO.cache.FileCache.settid.

As the monitor runs asynchronously with the observed threads,
the probability of an observation in a given function
depends on how long the thread is inside this function (total
time, i.e. visits times mean time per visit).
From this, we can conclude that a significant time is spend in
settid.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-21 Thread Dieter Maurer
Chris Withers wrote at 2008-3-20 22:22 +:
Roché Compaan wrote:
 Not yet, they are very time consuming. I plan to do the same tests over
 ZEO next to determine what overhead ZEO introduces.

Remember to try introducing more app servers and see where the 
bottleneck comes ;-)

We have seen commit contention with lots (24) of zeo clients
and a high write rate application (allmost all requests write to
the ZODB).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-20 Thread Chris Withers

Roché Compaan wrote:

Not yet, they are very time consuming. I plan to do the same tests over
ZEO next to determine what overhead ZEO introduces.


Remember to try introducing more app servers and see where the 
bottleneck comes ;-)


Am I right in thinking the storage servers is still essentially single 
threaded?


cheers,

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-19 Thread Manuel Vazquez Acosta
Roché Compaan wrote:
 On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
 I'll update my blog post with the final stats and let you know when it
 is ready.

 
 I'll have to keep running these tests because the more I run them the
 faster the ZODB becomes ;-) Would you have guessed that the ZODB is
 faster at both insertion and lookups than Postgres?
 
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited
 
 Lookups are even faster then what I originally reported. Lookup times
 averages at 2 milliseconds (0.002s) on 10 million objects. 
 
 I think somebody else should run these tests as well and validate the
 methodology behind them, otherwise I'm spreading lies.
 

Roché,

I'm very interested in your results. I'm assessing whether or not
implement an application with ZODB. I have had a previous good
experience, but was in a minor project. Now I need very fast lookups and
retrievals, possibly distributed DB system and as easy as ZODB.

Have done more tests recently?

On the other hand I'm wondering too how this relate to Zope.

Regards,
Manuel.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan
On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
 Roché Compaan wrote:
  On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:
  I'll update my blog post with the final stats and let you know when it
  is ready.
 
  
  I'll have to keep running these tests because the more I run them the
  faster the ZODB becomes ;-) Would you have guessed that the ZODB is
  faster at both insertion and lookups than Postgres?
  
  http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited
 
 For some workloads, ZODB is definitely faster.  However, I think your 
 analysis needs to provide more detail:

I posted some of the details earlier in this thread but I'll put them up
on the post as well.

 - How much RAM did ZODB and Postgres consume during the tests?

Don't know, will check.

 - How often are you committing a transaction?

100 inserts per transaction.

 - Did you use optimal methods of insertion in Postgres, such as COPY? 
 Also note that a standard way to insert a lot of data into a relational 
 database is to temporarily drop indexes and re-create them after 
 insertion.  Your original test may be more valid than you thought.

I don't think that this describes the typical interaction between an
application and a database. Usually records will be added to Postgres
using INSERT.

A goal of the benchmarks is to understand the limitations for
applications that use the ZODB and to challenge the idea that an RDBMS
should be used for applications that, in naive terms, require a big
database that can write fast.

 - Did you use optimal methods of retrieval in Postgres?  It is 
 frequently not necessary to pull the data into the application.  Copying 
 to another table could be faster than fetching rows.

I realise this. The only thing the lookup stats tell me is that lookups
in the ZODB don't need drastic optimisation, it is already very fast.

 - What is the ZODB cache size?  How much does the speed change as you 
 change the cache size?

A lot! With the default cache size the insert rate is a mere 50
inserts/second at around 10 million objects. I used a cache size of
10 in the benchmarks.

 - How much disk space does each database consume when there are 10M objects?

ZODB: 19GB

How do you check the size of a Postgres database?

 
  Lookups are even faster then what I originally reported. Lookup times
  averages at 2 milliseconds (0.002s) on 10 million objects. 
 
 Maybe if you set up a ZODB cache that allows just over 10 million 
 objects, the lookup time will drop to microseconds.  You might need a 
 lot of RAM to do that, though.

Maybe, but somehow I think that disk IO will prevent this. I'll check.

  I think somebody else should run these tests as well and validate the
  methodology behind them, otherwise I'm spreading lies.
 
 You're not far from something interesting.
 
 On a related topic, you might be interested in the RelStorage 
 performance charts I just posted.  Don't take them too seriously, but I 
 think the charts are useful.
 
 http://shane.willowrise.com/archives/relstorage-10-and-measurements/

Thanks for all your questions. I'll certainly post the missing detail on
the web and investigate some of the things that might affect
performance.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan
On Tue, 2008-03-04 at 23:00 +0100, Bernd Dorn wrote:
 On 04.03.2008, at 22:16, Shane Hathaway wrote:
 
  Lennart Regebro wrote:
  On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway  
  [EMAIL PROTECTED] wrote:
  - Did you use optimal methods of retrieval in Postgres?  It is
  frequently not necessary to pull the data into the application.   
  Copying
  to another table could be faster than fetching rows.
  But is that relevant in this case? Retrieval must reasonably really
  retrieve the data, not just move it around. :)
 
  Not if you're only retrieving intermediate information.  When you  
  write an application against a relational database, a lot of the  
  intermediate information does not need to be exposed to Python,  
  helping performance significantly.
 
 yes this is the major benefit when using a relational database over  
 zodb, because zodb has no server side query language,

This certainly makes some queries faster in an RDBMS. My first goal was
to determine the speed of the most basic operation like insert and
lookup.

  so the whole  
 lookup insert comparison does not reflect real world issues.

I disagree. It certainly tests some real world use cases. Most notably
the use case where you have a very large user base inserting content at
a very high rate into the ZODB. I think what is unknown at this stage is
what the penalty would be when using ZEO. But you can only know this if
you know how fast direct interaction with the ZODB is. Not all
applications require ZEO either.

 for example in one of our applications we have to calculate neighbours  
 of users, based on books they have in their bookshelfs. with about  
 1 users and each of them having an average of 100-500 books out of  
 ca. 1 million, the calculation of the neighbours takes seconds when  
 you have to calculate this on the client, by getting all indexes etc.  
 we switched to sql and wrote a single sql statement that does exactly  
 the same comparison which now takes about 300ms.
 
 your comparisons would only be accurate if comparing relstorage with  
 filestorage over zeo, because in this case there is no server side  
 query possible on object attributes . it would be interesting to look  
 at performance when having 4-10 zodb clients and then compare zeo/ 
 filestorage against relstorage with postgres.

Hopefully the tests are accurate in comparing the speed for basic
operations like insertion and lookup. They might be more *relevant* if
one performs the same tests using ZEO.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread David Pratt
Hi Roche. I figured this out once and it was included in PGStorage so it 
should be in relstorage also. Take a look at get_db_size method in 
postgres adapter. relstorage is in the zope repository.


Regards,
David

Roché Compaan wrote:



- How much disk space does each database consume when there are 10M objects?


ZODB: 19GB

How do you check the size of a Postgres database?

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Lennart Regebro
On Tue, Mar 4, 2008 at 10:16 PM, Shane Hathaway [EMAIL PROTECTED] wrote:
  Not if you're only retrieving intermediate information.

Sure. And my point is that in a typical websetting, you don't. You
either retrieve data to be displayed, or you insert data from a HTTP
post. Massaging data from one place of the database to another place
in the database is usually nothing you do on a per-request basis. And
if you do, then maybe you shouldn't. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Roché Compaan
On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
 On a related topic, you might be interested in the RelStorage 
 performance charts I just posted.  Don't take them too seriously, but
 I 
 think the charts are useful.
 
 http://shane.willowrise.com/archives/relstorage-10-and-measurements/

One question, if you run the test with concurrent threads, do each
thread insert a 100 objects (or a 1 for the second test)? 

Is this test available in SVN somewhere?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Shane Hathaway

Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
On a related topic, you might be interested in the RelStorage 
performance charts I just posted.  Don't take them too seriously, but
I 
think the charts are useful.


http://shane.willowrise.com/archives/relstorage-10-and-measurements/


One question, if you run the test with concurrent threads, do each
thread insert a 100 objects (or a 1 for the second test)? 


Yes.


Is this test available in SVN somewhere?


It's in the RelStorage 1.0 release as well as SVN.  It's called 
relstorage/tests/speedtest.py.


Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Benji York

Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
Maybe if you set up a ZODB cache that allows just over 10 million 
objects, the lookup time will drop to microseconds.  You might need a 
lot of RAM to do that, though.


Maybe, but somehow I think that disk IO will prevent this. I'll check.


If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread David Pratt

Hi Benji. Have you any settings to recommend or use a default. Many thanks.

Regards,
David

Benji York wrote:

Roché Compaan wrote:

On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote:
Maybe if you set up a ZODB cache that allows just over 10 million 
objects, the lookup time will drop to microseconds.  You might need a 
lot of RAM to do that, though.


Maybe, but somehow I think that disk IO will prevent this. I'll check.


If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Benji York

David Pratt wrote:

Hi Benji. Have you any settings to recommend or use a default. Many thanks.


For benchmarking?  No.

Too high and you'll spend a bunch of time swapping to free up space for 
disk cache; too low and you may not have a large enough disk cache to be 
effective.

--
Benji York
Senior Software Engineer
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-05 Thread Izak Burger

Benji York wrote:
If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; 
http://lwn.net/Articles/83588/) to affect how much RAM is used for the 
page cache and how much for your process.


While we're on that subject. We recently had a box that would take 
strain for almost no reason. You'd copy a bigish file from one place to 
another and the load average would just soar as the various zope and zeo 
instances tried to get to the disk. Turns out this machine used the 
anticipatory io scheduler, which really messes things up. We changed it 
to deadline, like so:


  echo deadline  /sys/block/sda/queue/scheduler

Performance is a lot better now. Our (not very scientific) tests shows 
that deadline is also a little better than cfq for running zope.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Shane Hathaway

Roché Compaan wrote:

On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote:

I'll update my blog post with the final stats and let you know when it
is ready.



I'll have to keep running these tests because the more I run them the
faster the ZODB becomes ;-) Would you have guessed that the ZODB is
faster at both insertion and lookups than Postgres?

http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited


For some workloads, ZODB is definitely faster.  However, I think your 
analysis needs to provide more detail:


- How much RAM did ZODB and Postgres consume during the tests?

- How often are you committing a transaction?

- Did you use optimal methods of insertion in Postgres, such as COPY? 
Also note that a standard way to insert a lot of data into a relational 
database is to temporarily drop indexes and re-create them after 
insertion.  Your original test may be more valid than you thought.


- Did you use optimal methods of retrieval in Postgres?  It is 
frequently not necessary to pull the data into the application.  Copying 
to another table could be faster than fetching rows.


- What is the ZODB cache size?  How much does the speed change as you 
change the cache size?


- How much disk space does each database consume when there are 10M objects?


Lookups are even faster then what I originally reported. Lookup times
averages at 2 milliseconds (0.002s) on 10 million objects. 


Maybe if you set up a ZODB cache that allows just over 10 million 
objects, the lookup time will drop to microseconds.  You might need a 
lot of RAM to do that, though.



I think somebody else should run these tests as well and validate the
methodology behind them, otherwise I'm spreading lies.


You're not far from something interesting.

On a related topic, you might be interested in the RelStorage 
performance charts I just posted.  Don't take them too seriously, but I 
think the charts are useful.


http://shane.willowrise.com/archives/relstorage-10-and-measurements/

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Lennart Regebro
On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway [EMAIL PROTECTED] wrote:
  - Did you use optimal methods of insertion in Postgres, such as COPY?
  Also note that a standard way to insert a lot of data into a relational
  database is to temporarily drop indexes and re-create them after
  insertion.  Your original test may be more valid than you thought.

  - Did you use optimal methods of retrieval in Postgres?  It is
  frequently not necessary to pull the data into the application.  Copying
  to another table could be faster than fetching rows.

But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Shane Hathaway

Lennart Regebro wrote:

On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway [EMAIL PROTECTED] wrote:

 - Did you use optimal methods of retrieval in Postgres?  It is
 frequently not necessary to pull the data into the application.  Copying
 to another table could be faster than fetching rows.


But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)


Not if you're only retrieving intermediate information.  When you write 
an application against a relational database, a lot of the intermediate 
information does not need to be exposed to Python, helping performance 
significantly.


Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-03-04 Thread Bernd Dorn


On 04.03.2008, at 22:16, Shane Hathaway wrote:


Lennart Regebro wrote:
On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway  
[EMAIL PROTECTED] wrote:

- Did you use optimal methods of retrieval in Postgres?  It is
frequently not necessary to pull the data into the application.   
Copying

to another table could be faster than fetching rows.

But is that relevant in this case? Retrieval must reasonably really
retrieve the data, not just move it around. :)


Not if you're only retrieving intermediate information.  When you  
write an application against a relational database, a lot of the  
intermediate information does not need to be exposed to Python,  
helping performance significantly.


yes this is the major benefit when using a relational database over  
zodb, because zodb has no server side query language, so the whole  
lookup insert comparison does not reflect real world issues.


for example in one of our applications we have to calculate neighbours  
of users, based on books they have in their bookshelfs. with about  
1 users and each of them having an average of 100-500 books out of  
ca. 1 million, the calculation of the neighbours takes seconds when  
you have to calculate this on the client, by getting all indexes etc.  
we switched to sql and wrote a single sql statement that does exactly  
the same comparison which now takes about 300ms.


your comparisons would only be accurate if comparing relstorage with  
filestorage over zeo, because in this case there is no server side  
query possible on object attributes . it would be interesting to look  
at performance when having 4-10 zodb clients and then compare zeo/ 
filestorage against relstorage with postgres.







Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


--
Lovely Systems, senior developer

phone: +43 5572 908060, fax: +43 5572 908060-77
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria
skype: bernd.dorn





smime.p7s
Description: S/MIME cryptographic signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-24 Thread Roché Compaan
I made a lovely mistake in my first round of benchmarks. Lovely, in
that it puts the ZODB in a much better light. When I first ran the
Postgres test, I neglected to put an index on the key field of the
table. I only added the index before I timed lookups on Postgres but
forgot to retest insertion. Since the key in the ZODB is effectively
indexed I think the test is only fair if the corresponding key in
Postgres is indexed.

Retesting insertion of a million records into Postgres with the index on
the key field revealed that Postgres performance deteriorates
logarithmically at roughly the same rate as the ZODB. After about 10
million insertions the ZODB was doing 250 inserts per second. After
adding the index on the table, Postgres was doing only slightly better
but not above 300 inserts per second.

I'll update my blog post with the final stats and let you know when it
is ready.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-08 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-7 21:44 +0200:
 ...
There are use cases where having a container in the ZODB that can handle
large volumes and maintain a high insertion rate would be very
convenient. An example of such a use case would be a site with millions
of members where each member has their own folder containing different
content types. The rate at which new members register is very high as
well

I do not believe that the insertion rate the ZODB can handle now
would not be sufficient to handle this use case.

I do not have your timings present, but from our installation
I know that the ZODB can handle 10 transactions per second.
This would mean about 36.000 per day (10 hour days)
and about 1 million in a month.

so the folder needs to handle insertions quickly. In this use case
you are not dealing with structured data. If members in a site with such
large volumes start to generate content, indexes in the ZODB become
problematic too because of the slow rate of insertion.

We have several write intensive applications with storages
in the order of 10 to 20 GB and 10 to 20 millions objects
-- and have not yet seen problems with the insertion rate.

We do see other problems (notably commit contention)
but these problems cannot be solved by an increased insertion
rate.

And it this point
you start to stuff everything in relational database and the whole
experience becomes painful ...

We speak again when you observe a concrete problem in a real
installation caused by limited insertion rate.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Roché Compaan
On Thu, 2008-02-07 at 20:26 +0100, Dieter Maurer wrote:
 Roché Compaan wrote at 2008-2-7 21:21 +0200:
  ...
 So if I asked you to build a data structure for the ZODB that can do
 insertions at a rate comparable to Postgres on high volumes, do you
 think that it can be done?
 
 If you need a high write rate, the ZODB is probably not optimal.
 Ask yourself whether it is not better to put such high frequency write
 data directly into a relational database.
 
 Whenever you have large amounts of highly structured data,
 a relational database is necessary more efficient than the ZODB.

I know it is not optimal for high write scenarios, but I'm asking if it
is possible to build a data structure for the ZODB that can do
insertions quickly.

There are use cases where having a container in the ZODB that can handle
large volumes and maintain a high insertion rate would be very
convenient. An example of such a use case would be a site with millions
of members where each member has their own folder containing different
content types. The rate at which new members register is very high as
well so the folder needs to handle insertions quickly. In this use case
you are not dealing with structured data. If members in a site with such
large volumes start to generate content, indexes in the ZODB become
problematic too because of the slow rate of insertion. And it this point
you start to stuff everything in relational database and the whole
experience becomes painful ...

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-7 21:21 +0200:
 ...
So if I asked you to build a data structure for the ZODB that can do
insertions at a rate comparable to Postgres on high volumes, do you
think that it can be done?

If you need a high write rate, the ZODB is probably not optimal.
Ask yourself whether it is not better to put such high frequency write
data directly into a relational database.

Whenever you have large amounts of highly structured data,
a relational database is necessary more efficient than the ZODB.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-07 Thread Roché Compaan
On Thu, 2008-02-07 at 00:39 +0100, Dieter Maurer wrote:
 If I understand correctly, for each insertion 3 calls are made to
 persistent_id? This is still very far from the 66 I mentioned above?
 
 You did not understand correctly.
 
 You insert an entry. The insertion modifies (at least) one OOBucket.
 The OOBucket needs to be written back. For each of its entries
 (one is your new one, but there may be up to 29 others) 3
 persistent_id calls will happen.

Thanks I understand now.

So if I asked you to build a data structure for the ZODB that can do
insertions at a rate comparable to Postgres on high volumes, do you
think that it can be done? If so, would it not be worth investing time
and money into this? If not, why not?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Roché Compaan
On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote:
 Roché Compaan wrote at 2008-2-4 20:54 +0200:
  ...
 I don't follow? There are 2 insertions and there are 1338046 calls
 to persistent_id. Doesn't this suggest that there are 66 objects
 persisted per insertion? This seems way to high?
 
 Jim told you that persistent_id is called for each object and not
 only persistent objects.
 
 An OOBucket contains up to 30 key value pairs, each of which
 are subjected to a call to persistent_id. In each of your pairs,
 there is an additional persistent object. This means, you
 should expect 3 calls to persistent_id for each pair in an OOBucket.

If I understand correctly, for each insertion 3 calls are made to
persistent_id? This is still very far from the 66 I mentioned above?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: RE : [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Dieter Maurer
Mignon, Laurent wrote at 2008-2-6 08:06 +0100:
After a lot of tests and benchmark, my feeling is that the ZODB does not seem 
suitable for systems managing many data stored in a plane hierarchy.
The application that we currently develop is a business process management 
system in opposition to a content management system. In order to guarantee the 
performances necessary, we decided to no more use the ZODB. All data are now 
stored in a relationnal database.

Roché's corrected timings indicate:

  The ZODB is significantly slower than Postgres for insertions
  but camparatively fast (slightly faster) on lookups.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-06 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-6 20:18 +0200:
On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote:
 Roché Compaan wrote at 2008-2-4 20:54 +0200:
  ...
 I don't follow? There are 2 insertions and there are 1338046 calls
 to persistent_id. Doesn't this suggest that there are 66 objects
 persisted per insertion? This seems way to high?
 
 Jim told you that persistent_id is called for each object and not
 only persistent objects.
 
 An OOBucket contains up to 30 key value pairs, each of which
 are subjected to a call to persistent_id. In each of your pairs,
 there is an additional persistent object. This means, you
 should expect 3 calls to persistent_id for each pair in an OOBucket.

If I understand correctly, for each insertion 3 calls are made to
persistent_id? This is still very far from the 66 I mentioned above?

You did not understand correctly.

You insert an entry. The insertion modifies (at least) one OOBucket.
The OOBucket needs to be written back. For each of its entries
(one is your new one, but there may be up to 29 others) 3
persistent_id calls will happen.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-05 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-4 20:54 +0200:
 ...
I don't follow? There are 2 insertions and there are 1338046 calls
to persistent_id. Doesn't this suggest that there are 66 objects
persisted per insertion? This seems way to high?

Jim told you that persistent_id is called for each object and not
only persistent objects.

An OOBucket contains up to 30 key value pairs, each of which
are subjected to a call to persistent_id. In each of your pairs,
there is an additional persistent object. This means, you
should expect 3 calls to persistent_id for each pair in an OOBucket.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-04 Thread David Binger


On Feb 4, 2008, at 1:54 PM, Roché Compaan wrote:


I don't follow? There are 2 insertions and there are 1338046 calls
to persistent_id. Doesn't this suggest that there are 66 objects
persisted per insertion? This seems way to high?


It seems like there is some confusion about the correspondence between
persisting an object and calls to persistent_id().  The pickler makes
lots of calls to persistent_id() as it is making pickles.
In my mind, persisting an object means saving the new state of
an instance of Persistent.  When you insert a new persistent instance
in a BTree, you are persisting the new instance, the bucket/node
that holds the reference to the new instance, and in some cases,
some small number of other bucket/nodes that are changed as part of
the insertion.  That's it.  If you insert a bunch of things in one
commit(), the number of persistent instances committed is even  
smaller because

some buckets will get multiple changes in one write.

There are usually many calls to persistent_id() when one btree bucket is
pickled, but I would count that as 1 persistent object.














___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-04 Thread Roché Compaan
On Sun, 2008-02-03 at 22:05 +0100, Dieter Maurer wrote:
 The number of persitent_id calls suggests that a written
 persistent object has a mean value of 65 subobjects -- which
 fits well will OOBuckets.
 
 However, when the profile is for commits with 100 insertions each,
 then the number of written persistent objects is far too small.
 In fact, we would expect about 200 persistent object writes per transaction:
 the 100 new persistent objects assigned plus about as many buckets
 changed by these insertions.

I don't follow? There are 2 insertions and there are 1338046 calls
to persistent_id. Doesn't this suggest that there are 66 objects
persisted per insertion? This seems way to high?

  
 The keys that I lookup are completely random so it is probably the case
 that the lookup causes disk lookups all the time. If this is the case,
 is 230ms not still to slow?
 
 Unreasonably slow in fact.
 
 A tree with size 10**7 does likely not have a depth larger than 4
 (internal nodes should typically have at least 125 entries, leaves should have
 at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * 
 10**6).
 Therefore, one would expect at most 4 disk accesses.
 
 On my (6 year old) computer, a disk access can take up to 30 ms.

The lookup times I reported were wrong. There was a bug in the code that
reported the lookup time - the correct average lookup time for a BTree
with 10 million objects was an impressive 12 ms. For Postgres this was
14 ms.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-03 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-3 09:15 +0200:
 ...
I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.

 Your profile looks very surprising:
 
   I would expect that for a single insertion, typically
   one persistent object (the bucket where the insertion takes place)
   is changed. About every 15 inserts, 3 objects are changed (the bucket
   is split) about every 15*125 inserts, 5 objects are changed
   (split of bucket and its container).
   But the mean value of objects changed in a transaction is 20
   in your profile.
   The changed objects typically have about 65 subobjects. This
   fits with OOBuckets.

It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.

The number of persitent_id calls suggests that a written
persistent object has a mean value of 65 subobjects -- which
fits well will OOBuckets.

However, when the profile is for commits with 100 insertions each,
then the number of written persistent objects is far too small.
In fact, we would expect about 200 persistent object writes per transaction:
the 100 new persistent objects assigned plus about as many buckets
changed by these insertions.

 
The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?

Unreasonably slow in fact.

A tree with size 10**7 does likely not have a depth larger than 4
(internal nodes should typically have at least 125 entries, leaves should have
at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * 10**6).
Therefore, one would expect at most 4 disk accesses.

On my (6 year old) computer, a disk access can take up to 30 ms.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Dieter Maurer
Roché Compaan wrote at 2008-2-1 21:17 +0200:
I have completed my first round of benchmarks on the ZODB and welcome
any criticism and advise. I summarised our earlier discussion and
additional findings in this blog entry:
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks

In your insertion test: when do you do commits?
One per insertion? Or one per n insertions (for which n)?


Your profile looks very surprising:

  I would expect that for a single insertion, typically
  one persistent object (the bucket where the insertion takes place)
  is changed. About every 15 inserts, 3 objects are changed (the bucket
  is split) about every 15*125 inserts, 5 objects are changed
  (split of bucket and its container).
  But the mean value of objects changed in a transaction is 20
  in your profile.
  The changed objects typically have about 65 subobjects. This
  fits with OOBuckets.


Lookup times:

0.23 s would be 230 ms not 23 ms.

The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
BTree implementation itself. Lookup time is proportional to
the tree depth, which ideally would be O(log(n)). While BTrees
are not necessarily balanced (and therefore the depth may be larger
than logarithmic) it is not easy to obtain a severely unbalanced
tree by insertions only.
Other factors must have contributed to this drop: swapping, cache too small,
garbage collections...

Furthermore, the lookup times for your smaller BTrees are far too
good -- fetching any object from disk takes in the order of several
ms (2 to 20, depending on your disk).
This means that the lookups for your smaller BTrees have
typically been served directly from the cache (no disk lookups).
With your large BTree disk lookups probably became necessary.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-02 Thread Roché Compaan
On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote:
 Roché Compaan wrote at 2008-2-1 21:17 +0200:
 I have completed my first round of benchmarks on the ZODB and welcome
 any criticism and advise. I summarised our earlier discussion and
 additional findings in this blog entry:
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks
 
 In your insertion test: when do you do commits?
 One per insertion? Or one per n insertions (for which n)?

I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.

 Your profile looks very surprising:
 
   I would expect that for a single insertion, typically
   one persistent object (the bucket where the insertion takes place)
   is changed. About every 15 inserts, 3 objects are changed (the bucket
   is split) about every 15*125 inserts, 5 objects are changed
   (split of bucket and its container).
   But the mean value of objects changed in a transaction is 20
   in your profile.
   The changed objects typically have about 65 subobjects. This
   fits with OOBuckets.

It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.

 Lookup times:
 
 0.23 s would be 230 ms not 23 ms.

Oops my multiplier broke ;-)

 
 The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
 BTree implementation itself. Lookup time is proportional to
 the tree depth, which ideally would be O(log(n)). While BTrees
 are not necessarily balanced (and therefore the depth may be larger
 than logarithmic) it is not easy to obtain a severely unbalanced
 tree by insertions only.
 Other factors must have contributed to this drop: swapping, cache too small,
 garbage collections...

The cache size was set to 10 objects so I doubt that this was the
cause. I do the lookup test right after I populate the BTree so it might
be that the cache and memory is full but I take care to commit after the
BTree is populated so even this is unlikely.

The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?

 Furthermore, the lookup times for your smaller BTrees are far too
 good -- fetching any object from disk takes in the order of several
 ms (2 to 20, depending on your disk).
 This means that the lookups for your smaller BTrees have
 typically been served directly from the cache (no disk lookups).
 With your large BTree disk lookups probably became necessary.

I accept that these lookups all all served from cache. I am going to
modify the lookup test so that I close the database after population and
re-open it when starting the test to make sure nothing is cached and see
what the results look like.

Thanks for your insightful comments!

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2008-02-01 Thread Roché Compaan
I have completed my first round of benchmarks on the ZODB and welcome
any criticism and advise. I summarised our earlier discussion and
additional findings in this blog entry:
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Jim Fulton


On Nov 6, 2007, at 3:17 PM, Roché Compaan wrote:


On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote:

On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:


Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the
ObjectWriter
in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance

being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?


Yup.

Note that there is a undocumented feature in cPickle that I added
years ago to deal with this issue but never got around to pursuing.
Maybe someone else would be able to spend the time to try it out and
report back.

If you set inst_persistent_id, rather than persistent_id, on a
pickler, then the hook will only be called for instances.  This
should eliminate that vast majority of the calls.


So is this as simple as modifying the following code in the
ObjectWriter:

self._p.persistent_id = self.persistent_id

to:

self._p.inst_persistent_id = self.persistent_id


Yes.


I'll give it a go as part of my benchmarks that I'm running and report
back.



I hope you weren't waiting for my answer.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-12-06 Thread Roché Compaan
On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote:
 On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:
 
  Jim Fulton wrote:
  On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
  Despite this change there are still a huge amount
  of unexplained calls to the 'persistent_id' method of the  
  ObjectWriter
  in serialize.py.
 
  Why 'unexplained'? 'persistent_id' is called from the Pickler  
  instance
  being used in ObjectWriter._dump(). It is called for each and every
  single object reachable from the main object, due to the way Pickler
  works (I believe). Maybe persistent_id can be analysed and optimized
  for the most common cases?
  Yup.
  Note that there is a undocumented feature in cPickle that I added  
  years ago to deal with this issue but never got around to  
  pursuing.  Maybe someone else would be able to spend the time to  
  try it out and report back.
  If you set inst_persistent_id, rather than persistent_id, on a  
  pickler, then the hook will only be called for instances.  This  
  should eliminate that vast majority of the calls.
  Note that this feature was added back when testing was minimal or  
  non-existent, so it is untested, however, the implementation is  
  simple enough.  :)
 
  Do you mean that the ZODB has enough tests now that making the  
  change and running the tests might already be a good proof ?
 
 No, I mean that pickle and cPickle lack tests for this feature.
 
  Or should we be more prudent ?
 
 It would be nice to try this out with ZODB to see if it makes much  
 difference.  If it does, then that would provide extra motivation for  
 me to add the missing test.
 
 Roché Compaan said he would try it out, but I just realized that he  
 might have been waiting for me.

Sorry for not responding earlier. I actually tried this out immediately
after you suggested it and was very impressed with the improvement in
performance. I have been meaning to write back to give a thorough report
but a project with insane deadlines caught up with me. This project ends
this week so I plan to continue with my benchmark test next week and
give feedback thereafter.

The amount of calls to persistent_id dropped dramatically and in a test
of 10 million inserts the insert rate almost doubled from 1000 thousand
to 2000 inserts per second for at least the first million inserts. The
insert rate decreases rapidly thereafter until it drops to the insert
rate recorded before the persistent_id change. I guess at this point the
overhead of bucket splits are two high.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Sidnei da Silva
 Despite this change there are still a huge amount
 of unexplained calls to the 'persistent_id' method of the ObjectWriter
 in serialize.py.

Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?

-- 
Sidnei da Silva
Enfold Systemshttp://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan
On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:
  Despite this change there are still a huge amount
  of unexplained calls to the 'persistent_id' method of the ObjectWriter
  in serialize.py.
 
 Why 'unexplained'? 'persistent_id' is called from the Pickler instance
 being used in ObjectWriter._dump(). It is called for each and every
 single object reachable from the main object, due to the way Pickler
 works (I believe). Maybe persistent_id can be analysed and optimized
 for the most common cases?
 

If you look at the profiler stats I posted earlier you would have
noticed that there was about 1.3 million calls to persistent_id while
only 2 objects were persisted. So if it is being called for each
object I would expect a figure closer to 2, not 1.3 million. What am
I missing?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Jim Fulton


On Nov 6, 2007, at 3:01 PM, Roché Compaan wrote:


On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote:

Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the  
ObjectWriter

in serialize.py.


Why 'unexplained'? 'persistent_id' is called from the Pickler  
instance

being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?



If you look at the profiler stats I posted earlier you would have
noticed that there was about 1.3 million calls to persistent_id while
only 2 objects were persisted. So if it is being called for each
object I would expect a figure closer to 2, not 1.3 million.  
What am

I missing?


It's called for *all* objects, not just persistent objects. This  
includes, ints, strings (including attribute names), etc.


Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-06 Thread Roché Compaan
On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote:
 On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
 
  Despite this change there are still a huge amount
  of unexplained calls to the 'persistent_id' method of the  
  ObjectWriter
  in serialize.py.
 
  Why 'unexplained'? 'persistent_id' is called from the Pickler instance
  being used in ObjectWriter._dump(). It is called for each and every
  single object reachable from the main object, due to the way Pickler
  works (I believe). Maybe persistent_id can be analysed and optimized
  for the most common cases?
 
 Yup.
 
 Note that there is a undocumented feature in cPickle that I added  
 years ago to deal with this issue but never got around to pursuing.   
 Maybe someone else would be able to spend the time to try it out and  
 report back.
 
 If you set inst_persistent_id, rather than persistent_id, on a  
 pickler, then the hook will only be called for instances.  This  
 should eliminate that vast majority of the calls.

So is this as simple as modifying the following code in the
ObjectWriter:

self._p.persistent_id = self.persistent_id

to:

self._p.inst_persistent_id = self.persistent_id


I'll give it a go as part of my benchmarks that I'm running and report
back.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Christian Theune
Hi,

Am Freitag, den 02.11.2007, 09:56 -0400 schrieb David Binger:
 On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote:
 
  On 11/2/07, Matt Hamilton [EMAIL PROTECTED] wrote:
  That may just end up causing delays periodically in  
  transactions... ie delays
  that the user sees, as opposed to doing it via another thread or  
  something.  But
  then as only one thread would be doing this at a time it might not  
  be too bad.
 
  But wouldn't then all other threads get a conflict?
 
 If they are trying to do insertions at the same time as the  
 consolidation, yes.
 This data structure won't stop insertion conflicts, the intent is to  
 make them
 less frequent.

Hmm.

Wouldn't a queue be a good data structure to do that? IIRC ZC already
wrote a queue that doesn't conflict:

http://svn.zope.de/zope.org/zc.queue/trunk/src/zc/queue/queue.txt

If you store key/value pairs in the queue, you can do a step-by-step
migration from the queue to the btree.

Probably this should be encapsulated into a new data structure that
looks btree-like and has an additional `consolidate` method.

Calling the `consolidate` method would have to happen from the
application that uses this data structure. Two issues I can think of
immediately:

- General: We need an efficient way to find all data structures that
need reconciliation, maybe a ZODB-wide index of all objects that require
reconciliation would be nice. 

- With Zope and ZEO: which Zope server is responsible for actually
performing the reconciliation? One of the Zope servers that is marked in
the zope.conf? Or maybe the ZEO server?

Christian

-- 
gocept gmbh  co. kg - forsterstrasse 29 - 06112 halle (saale) - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger


On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote:


On 11/2/07, Matt Hamilton [EMAIL PROTECTED] wrote:
That may just end up causing delays periodically in  
transactions... ie delays
that the user sees, as opposed to doing it via another thread or  
something.  But
then as only one thread would be doing this at a time it might not  
be too bad.


But wouldn't then all other threads get a conflict?


If they are trying to do insertions at the same time as the  
consolidation, yes.
This data structure won't stop insertion conflicts, the intent is to  
make them

less frequent.











___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Roché Compaan
On Fri, 2007-11-02 at 16:00 +, Laurence Rowe wrote:
 Matt Hamilton wrote:
  David Binger dbinger at mems-exchange.org writes:
  
 
  On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:
 
  Lots of people don't do nightly packs, I'm pretty sure such a process
  needs to be completely automatic. The question is weather doing it in
  a separate process in the background, or ever X transactions, or every
  X seconds, or something.
  Okay, perhaps the trigger should be the depth of the small-bucket tree.
  
  That may just end up causing delays periodically in transactions... ie 
  delays
  that the user sees, as opposed to doing it via another thread or something. 
   But
  then as only one thread would be doing this at a time it might not be too 
  bad.
  
  -Matt
 
 ClockServer sections can now be specified in zope.conf. If you specify 
 them with a period of say 10 mins (or even 2) then the queue should 
 never get too large, and the linear search time is not a problem as n is 
 small.
 
 Essentially you end up with a solution very similar to QueueCatalog but 
 with the queue being searchable.
 
 The pain is then in modifying all of the indexes to search the queue in 
 addition to their standard data structures.

I don't think that you need to modify the indexes at all. You simply
pass the query arguments to the queue in the exact same way that you
apply the arguments to individual indexes.

I think with a little enhancement to QueueCatalog one should be able to
pull this off.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger


On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:


Lots of people don't do nightly packs, I'm pretty sure such a process
needs to be completely automatic. The question is weather doing it in
a separate process in the background, or ever X transactions, or every
X seconds, or something.


Okay, perhaps the trigger should be the depth of the small-bucket tree.



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread David Binger


On Nov 2, 2007, at 5:48 AM, Lennart Regebro wrote:


On 11/1/07, Matt Hamilton [EMAIL PROTECTED] wrote:
An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


Would it be possible to not occasionally consolidate, but actually
do it ongoing, but just one process, thereby always inserting just one
transaction into the normal BTree at a time? Or does that cause
troubles?


I think that option would work.  I think it would suffice to do a
Big.update(Small); Small.clear() operation before a nightly pack.
It might invalidate every bucket in every cache, but BTrees are
designed to perform reasonably without a cache.







___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Lennart Regebro
On 11/1/07, Matt Hamilton [EMAIL PROTECTED] wrote:
 An interesting idea.  Surely we need the opposite though, and that is an
 additional BTree with a very large bucket size, as we want to minimize the
 chance of a bucket split when inserting?  Then we occasionally consolidate and
 move the items in the original BTree with the regular bucket size/branch 
 factor.

Would it be possible to not occasionally consolidate, but actually
do it ongoing, but just one process, thereby always inserting just one
transaction into the normal BTree at a time? Or does that cause
troubles?

//In way over my head.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Russ Ferriday
This is the 'batch' or 'distribute' pattern that crops up in many  
fields.


The best path is normally to understand what the conflicts are, and  
where the time is spent.
If in, this case, much time is spent in the preamble, and the actual  
inserts are quick, then diving down one time through the security  
layers and stuffing in 10 items is clearly better than 10 preambles,  
one for each insert.


The other truism is that all optimisation is for a single case.   
There may be different answers for different cases. Ideally a single  
parameter would be enough to tune the system for different cases.


Good luck Roche, with the outcome, I'm excited to see some figures.

--r.


On 2 Nov 2007, at 15:24, David Binger wrote:



On Nov 2, 2007, at 10:58 AM, Lennart Regebro wrote:


It seems to me having one thread doing a background consolidation one
transaction at a time seems a better way to go,


Maybe, but maybe that just causes big buckets to get invalidated
in all of the clients over and over again, when we could accomplish
the same objective in one invalidation by waiting longer and executing
a bigger consolidation.


although certainly the
best thing would be to test all kinds of solutions and see.


No doubt about that.


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Russ Ferriday
[EMAIL PROTECTED]
office: +44 118 3217026
mobile: +44 7789 338868
skype: ferriday



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Roché Compaan
On Wed, 2007-10-31 at 10:47 -0400, David Binger wrote:
 On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote:
 
  Thanks for the explanation.
 
 The actual insertion is very fast.  Your benchmark is dominated by
 the time to serialize the changes due to an insertion.
 
 You should usually have just 2 instances to serialize per insertion of
 a new instance:  the instance itself and the b-node that points to the
 instance.  An insertion may also cause changes in 2 or several b-nodes,
 but those cases are less likely.
 
 Serializing your simple instances is probably fast, but serializing
 the b-nodes appears to be taking much more time, and probably accounts
 for the large number of calls to persistent_id.  B-Nodes with higher
 branching factors will have more parts to serialize and they will be
 slower.  If you can cut the b-node branching factor in half, I bet your
 benchmark will run almost twice as fast.


I guess you are referring to the B-Tree bucket size? This is not really
configurable and one will have to recompile the C code once you modify
it. For OOBTree the max bucket size is 30.  I'll see what effect it has
on the test nevertheless.


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Gary Poster


On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger dbinger at mems-exchange.org writes:


On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


maybe.  haven't thought it through, but worth thinking about.

idle thought I should probably not share:

you could use a Bucket directly for that--it will never split at all,  
and has the conflict resolution behavior.


(strangely, I'm not idle at all, but rather overwhelmingly busy ;-) )

Gary

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread David Binger


On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger dbinger at mems-exchange.org writes:




On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


You may be right about that.  Conflict resolution makes it harder for
me to predict which way is better.  If you don't have conflict  
resolution

for insertions, then I think the smaller buckets are definitely better
for avoiding conflicts.  In either case, smaller buckets reduce the
size and serialization time of the insertion transactions, and that  
alone

*might* be a reason to favor them.  I think I'd still bet on smaller
buckets, but tests would expose the trade-offs.








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-11-01 Thread Russ Ferriday

Quick note...

Smaller buckets, fewer conflicts, more overhead on reading and writing.
Larger buckets, more conflicts, less overhead on reading and writing.
One bucket ... constant conflicts.

I'd bet that the additional tree with tiny buckets would be best.  
Transfer them into the normal tree once the overhead starts rising.


How to transfer them?  You would not want a single transaction to  
take the hit for the whole transfer, so have a low and high water  
mark. When hitting HWM, transfer only until LWM is reached. Or, just  
focus on transferring some items out of a single tree, to avoid the  
cost of tree rebalancing on the additional tree at the same time as  
rebalancing on the main tree.


Sounds like a fin project,

--r.

On 1 Nov 2007, at 21:00, David Binger wrote:



On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote:


David Binger dbinger at mems-exchange.org writes:




On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote:


Ie we perhaps look at a catalog data structure
in which writes are initially done to some kind of queue then moved
to the
BTrees at a later point.


A suggestion: use a pair of BTrees, one with a high branching factor
(bucket size)
and one with a very low branching factor.  Force all writes into the
tree with little
buckets.  Make every search look in both trees.  Consolidate
occasionally.


An interesting idea.  Surely we need the opposite though, and that  
is an
additional BTree with a very large bucket size, as we want to  
minimize the
chance of a bucket split when inserting?  Then we occasionally  
consolidate and
move the items in the original BTree with the regular bucket size/ 
branch factor.


You may be right about that.  Conflict resolution makes it harder for
me to predict which way is better.  If you don't have conflict  
resolution

for insertions, then I think the smaller buckets are definitely better
for avoiding conflicts.  In either case, smaller buckets reduce the
size and serialization time of the insertion transactions, and that  
alone

*might* be a reason to favor them.  I think I'd still bet on smaller
buckets, but tests would expose the trade-offs.








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Russ Ferriday
[EMAIL PROTECTED]
office: +44 118 3217026
mobile: +44 7789 338868
skype: ferriday



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Sidnei da Silva
I think someone proposed to have something just like a WAL in ZODB.
That could be an interesting optimization.

-- 
Sidnei da Silva
Enfold Systemshttp://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Roché Compaan
On Wed, 2007-10-31 at 10:00 +, Laurence Rowe wrote:
 It looks like ZODB performance in your test has the same O(log n) 
 performance as PostgreSQL checkpoints (the periodic drops in your 
 graph). This should come as no surprise. B-Trees have a theoretical 
 Search/Insert/Delete time complexity equal to the height of the tree, 
 which is (up to) log(n).
 
 So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
 inserts. Instead of inserting into the (B-Tree based) data files at 
 every transaction commit it writes a record to the WAL. This does not 
 require traversal of the B-Tree and has O(1) time complexity. The 
 penalty for this is that read operations become more complex, they must 
 look first in the WAL and overlay those results with the main index. The 
 WAL is never allowed to get too large, or its in memory index would 
 become too big.

Thanks for the explanation. After some profiling I noticed that there
are millions of OID lookups in the index. Increasing the cache size from
400 to 10 led to more acceptable levels of performance degradation.
I'll post some results later on. Some profiling also showed that there
are huge amount of calls to the persistent_id method of the ObjectWriter
- persisting 1 objects leads to 1338046 calls to persistent_id. This
seems to have quite a bit of overhead. Profile results attached.

 If you are going to have this number of records -- in a single B-Tree -- 
 then use a relational database. It's what they're optimised for.

The point of the benchmark is to determine what this number of records
means and to deduce best practice when working with the ZODB. I would
much rather tell a developer to use multiple B-Trees if he wants to
store this number of records than tell them to use a relational
database. Telling a ZODB programmer to use a relational database is an
insult ;-)

One of the tests that I want to try out next is to insert records
concurrently into different B-Trees.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za
Tue Oct 30 20:28:04 2007/tmp/profile-1.dat

 6108977 function calls (6108973 primitive calls) in 57.280 CPU seconds

   Ordered by: cumulative time
   List reduced from 232 to 20 due to restriction 20

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.000   57.280   57.280 profile_zodb.py:70(run)
10.0000.000   57.280   57.280 string:1(?)
10.2600.260   57.280   57.280 profile_zodb.py:24(_btrees_insert)
10.0000.000   57.280   57.280 profile:0(run())
 10010.0300.000   51.0600.051 _manager.py:88(commit)
 10010.0400.000   50.9900.051 _transaction.py:365(commit)
 10010.1100.000   50.7300.051 
_transaction.py:486(_commitResources)
 10010.0200.000   48.0600.048 Connection.py:496(commit)
 10010.2200.000   48.0400.048 Connection.py:512(_commit)
 98890.9400.000   47.3400.005 Connection.py:561(_store_objects)
203720.4800.000   39.7900.002 serialize.py:381(serialize)
203720.5000.000   38.9500.002 serialize.py:409(_dump)
407507.7900.000   38.0200.001 :0(dump)
  1338046   17.5600.000   30.2300.000 serialize.py:184(persistent_id)
  21772239.1500.0009.1500.000 :0(isinstance)
203731.5500.0005.2400.000 FileStorage.py:631(store)
 29640.0500.0004.9800.002 Connection.py:749(setstate)
 29640.1000.0004.9300.002 Connection.py:769(_setstate)
 29640.0800.0004.1800.001 serialize.py:603(setGhostState)
 29640.0300.0004.1000.001 serialize.py:593(getState)
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev