Re: [ZODB-Dev] Re: ZODB Benchmarks
Sean Allen wrote at 2008-3-25 15:23 -0400: > ... > >On Mar 25, 2008, at 2:54 PM, Dieter Maurer wrote: >> Benji York wrote at 2008-3-25 14:24 -0400: >>> ... commit contentions ... Almost surely there are several causes that all can lead to contention. We already found: * client side causes (while the client helds to commit lock) - garbage collections (which can block a client in the order of 10 to 20 s) >... >> A reconfiguration of the garbage collector helped us with this one >> (the standard configuration is not well tuned to processes with >> large amounts of objects). > >what'd you do? # reconfigure garbage collector # generation 0 GC at "(allocated - freed) == 7.000"; analyse 7.000 objects # generation 1 GC at "(allocated - freed) == 140.000"; analyse 140.000 objects # generation 2 GC at "(allocated - freed) == 1.400.000"; analyse all objects import gc; gc.set_threshold(7000, 20, 10) -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Mar 25, 2008, at 2:54 PM, Dieter Maurer wrote: Benji York wrote at 2008-3-25 14:24 -0400: ... commit contentions ... Almost surely there are several causes that all can lead to contention. We already found: * client side causes (while the client helds to commit lock) - garbage collections (which can block a client in the order of 10 to 20 s) Interesting. Perhaps someone might enjoy investigating turning off garbage collection during commits. A reconfiguration of the garbage collector helped us with this one (the standard configuration is not well tuned to processes with large amounts of objects). what'd you do? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Benji York wrote at 2008-3-25 14:24 -0400: > ... commit contentions ... >> Almost surely there are several causes that all can lead to contention. >> >> We already found: >> >> * client side causes (while the client helds to commit lock) >> >> - garbage collections (which can block a client in the order of >> 10 to 20 s) > >Interesting. Perhaps someone might enjoy investigating turning off >garbage collection during commits. A reconfiguration of the garbage collector helped us with this one (the standard configuration is not well tuned to processes with large amounts of objects). > >> - invalidation processing, espicially ZEO ClientCache processing > >Interesting. Not knowing much about how invalidations are handled, I'm >curious where the slow-down is. Do you have any more detail? Not many: We have a component called RequestMonitor which periodically checks for long running requests and logs the corresponding stack traces. This monitor very often sees requests (holding the commit lock) which are in "ZEO.cache.FileCache.settid". As the monitor runs asynchronously with the observed threads, the probability of an observation in a given function depends on how long the thread is inside this function (total time, i.e. visits times mean time per visit). >From this, we can conclude that a significant time is spend in "settid". -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Dieter Maurer wrote: We do not yet precisely the cause of our commit contentions. Hard to tell what'll make them better then. ;) Almost surely there are several causes that all can lead to contention. We already found: * client side causes (while the client helds to commit lock) - garbage collections (which can block a client in the order of 10 to 20 s) Interesting. Perhaps someone might enjoy investigating turning off garbage collection during commits. - NFS operations (which can take up to 27 s in our setup -- for still unknown reasons) Not much ZODB can do about that. ;) - invalidation processing, espicially ZEO ClientCache processing Interesting. Not knowing much about how invalidations are handled, I'm curious where the slow-down is. Do you have any more detail? * server side causes - commit lock hold during copy phase of pack - IO trashing during the reachability analysis in pack The new pack code should help quite a bit with the above (if you're saying what I think you're saying). - non deterministic server side IO anomalities (IO suddently takes several times longer than usual -- for still unknown reasons) Curious. -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Benji York wrote at 2008-3-25 09:40 -0400: >Christian Theune wrote: >> I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we >> should look into a technique called `group commit` to get rid of the "commit >> contention". > ... >Summary: fsync is slow (and the cornerstone of most commit steps), so >try to gather up a small batch of commits to do all at once (with only >one call to fsync). Our commit contention definitely is not caused by "fsync". Our "fsync" is quite fast. If only "fsync" would need to be considered, we could easily process at least 1.000 transactions per second -- but actually with 10 transactions per second we get contentions a few times per week. We do not yet precisely the cause of our commit contentions. Almost surely there are several causes that all can lead to contention. We already found: * client side causes (while the client helds to commit lock) - garbage collections (which can block a client in the order of 10 to 20 s) - NFS operations (which can take up to 27 s in our setup -- for still unknown reasons) - invalidation processing, espicially ZEO ClientCache processing * server side causes - commit lock hold during copy phase of pack - IO trashing during the reachability analysis in pack - non deterministic server side IO anomalities (IO suddently takes several times longer than usual -- for still unknown reasons) > Somewhat like Nagle's algorithm, but for fsync. > >The kicker is that OSs and hardware often lie about fsync (and it's >therefore fast) and good hardware (disk arrays with battery backed write >cache) already make fsync pretty fast. > >Not to suggest that group commit wouldn't speed things up, but it would >seem that the technique will make the largest improvement for people >that are using a non-lying fsync on inappropriate hardware. >-- >Benji York >Senior Software Engineer >Zope Corporation -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Christian Theune wrote: I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we should look into a technique called `group commit` to get rid of the "commit contention". Does anybody know this technique already and maybe has a pointer for me? I'd never heard the phrase until reading your message, but I think I got a pretty clear picture from http://forums.mysql.com/read.php?22,53854,53854#msg-53854 and http://archives.postgresql.org/pgsql-hackers/2007-03/msg01696.php. Summary: fsync is slow (and the cornerstone of most commit steps), so try to gather up a small batch of commits to do all at once (with only one call to fsync). Somewhat like Nagle's algorithm, but for fsync. The kicker is that OSs and hardware often lie about fsync (and it's therefore fast) and good hardware (disk arrays with battery backed write cache) already make fsync pretty fast. Not to suggest that group commit wouldn't speed things up, but it would seem that the technique will make the largest improvement for people that are using a non-lying fsync on inappropriate hardware. -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Hi, On Fri, Mar 21, 2008 at 09:08:28PM +0100, Dieter Maurer wrote: > Chris Withers wrote at 2008-3-20 22:22 +: > >Roché Compaan wrote: > >> Not yet, they are very time consuming. I plan to do the same tests over > >> ZEO next to determine what overhead ZEO introduces. > > > >Remember to try introducing more app servers and see where the > >bottleneck comes ;-) > > We have seen "commit contention" with lots (24) of zeo clients > and a high write rate application (allmost all requests write to > the ZODB). I talked to Brian Aker (MySQL guy) two weeks ago and he proposed that we should look into a technique called `group commit` to get rid of the "commit contention". Does anybody know this technique already and maybe has a pointer for me? Christian -- gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Chris Withers wrote at 2008-3-20 22:22 +: >Roché Compaan wrote: >> Not yet, they are very time consuming. I plan to do the same tests over >> ZEO next to determine what overhead ZEO introduces. > >Remember to try introducing more app servers and see where the >bottleneck comes ;-) We have seen "commit contention" with lots (24) of zeo clients and a high write rate application (allmost all requests write to the ZODB). -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chris Withers wrote: > Roché Compaan wrote: >> Not yet, they are very time consuming. I plan to do the same tests over >> ZEO next to determine what overhead ZEO introduces. > > Remember to try introducing more app servers and see where the > bottleneck comes ;-) > > Am I right in thinking the storage servers is still essentially single > threaded? Yes, but they are normally network / disk bound, rather than CPU bound, which makes that less of an issue. Tres. - -- === Tres Seaver +1 540-429-0999 [EMAIL PROTECTED] Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH4xqu+gerLs4ltQ4RApy5AJ9D3nzvv9329p5hdeZZSUj+smbPrQCfe5nd 8tInjsl71RrpSVOD7DVADYE= =129a -END PGP SIGNATURE- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote: Not yet, they are very time consuming. I plan to do the same tests over ZEO next to determine what overhead ZEO introduces. Remember to try introducing more app servers and see where the bottleneck comes ;-) Am I right in thinking the storage servers is still essentially single threaded? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Thu, 2008-03-20 at 00:00 -0400, Manuel Vazquez Acosta wrote: > Roché Compaan wrote: > > On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote: > >> I'll update my blog post with the final stats and let you know when it > >> is ready. > >> > > > > I'll have to keep running these tests because the more I run them the > > faster the ZODB becomes ;-) Would you have guessed that the ZODB is > > faster at both insertion and lookups than Postgres? > > > > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited > > > > Lookups are even faster then what I originally reported. Lookup times > > averages at 2 milliseconds (0.002s) on 10 million objects. > > > > I think somebody else should run these tests as well and validate the > > methodology behind them, otherwise I'm spreading lies. > > > > Roché, > > I'm very interested in your results. I'm assessing whether or not > implement an application with ZODB. I have had a previous good > experience, but was in a minor project. Now I need very fast lookups and > retrievals, possibly distributed DB system and as easy as ZODB. > > Have done more tests recently? Not yet, they are very time consuming. I plan to do the same tests over ZEO next to determine what overhead ZEO introduces. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote: > On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote: >> I'll update my blog post with the final stats and let you know when it >> is ready. >> > > I'll have to keep running these tests because the more I run them the > faster the ZODB becomes ;-) Would you have guessed that the ZODB is > faster at both insertion and lookups than Postgres? > > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited > > Lookups are even faster then what I originally reported. Lookup times > averages at 2 milliseconds (0.002s) on 10 million objects. > > I think somebody else should run these tests as well and validate the > methodology behind them, otherwise I'm spreading lies. > Roché, I'm very interested in your results. I'm assessing whether or not implement an application with ZODB. I have had a previous good experience, but was in a minor project. Now I need very fast lookups and retrievals, possibly distributed DB system and as easy as ZODB. Have done more tests recently? On the other hand I'm wondering too how this relate to Zope. Regards, Manuel. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Benji York wrote: If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; http://lwn.net/Articles/83588/) to affect how much RAM is used for the page cache and how much for your process. While we're on that subject. We recently had a box that would take strain for almost no reason. You'd copy a bigish file from one place to another and the load average would just soar as the various zope and zeo instances tried to get to the disk. Turns out this machine used the anticipatory io scheduler, which really messes things up. We changed it to deadline, like so: echo deadline > /sys/block/sda/queue/scheduler Performance is a lot better now. Our (not very scientific) tests shows that deadline is also a little better than cfq for running zope. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
David Pratt wrote: Hi Benji. Have you any settings to recommend or use a default. Many thanks. For benchmarking? No. Too high and you'll spend a bunch of time swapping to free up space for disk cache; too low and you may not have a large enough disk cache to be effective. -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Hi Benji. Have you any settings to recommend or use a default. Many thanks. Regards, David Benji York wrote: Roché Compaan wrote: On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote: Maybe if you set up a ZODB cache that allows just over 10 million objects, the lookup time will drop to microseconds. You might need a lot of RAM to do that, though. Maybe, but somehow I think that disk IO will prevent this. I'll check. If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; http://lwn.net/Articles/83588/) to affect how much RAM is used for the page cache and how much for your process. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote: On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote: Maybe if you set up a ZODB cache that allows just over 10 million objects, the lookup time will drop to microseconds. You might need a lot of RAM to do that, though. Maybe, but somehow I think that disk IO will prevent this. I'll check. If you're on Linux, you can tweak swappiness (/proc/sys/vm/swappiness; http://lwn.net/Articles/83588/) to affect how much RAM is used for the page cache and how much for your process. -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote: On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote: On a related topic, you might be interested in the RelStorage performance charts I just posted. Don't take them too seriously, but I think the charts are useful. http://shane.willowrise.com/archives/relstorage-10-and-measurements/ One question, if you run the test with concurrent threads, do each thread insert a 100 objects (or a 1 for the second test)? Yes. Is this test available in SVN somewhere? It's in the RelStorage 1.0 release as well as SVN. It's called relstorage/tests/speedtest.py. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote: > On a related topic, you might be interested in the RelStorage > performance charts I just posted. Don't take them too seriously, but > I > think the charts are useful. > > http://shane.willowrise.com/archives/relstorage-10-and-measurements/ One question, if you run the test with concurrent threads, do each thread insert a 100 objects (or a 1 for the second test)? Is this test available in SVN somewhere? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, Mar 4, 2008 at 10:16 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote: > Not if you're only retrieving intermediate information. Sure. And my point is that in a typical websetting, you don't. You either retrieve data to be displayed, or you insert data from a HTTP post. Massaging data from one place of the database to another place in the database is usually nothing you do on a per-request basis. And if you do, then maybe you shouldn't. :) -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Hi Roche. I figured this out once and it was included in PGStorage so it should be in relstorage also. Take a look at get_db_size method in postgres adapter. relstorage is in the zope repository. Regards, David Roché Compaan wrote: - How much disk space does each database consume when there are 10M objects? ZODB: 19GB How do you check the size of a Postgres database? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2008-03-04 at 23:00 +0100, Bernd Dorn wrote: > On 04.03.2008, at 22:16, Shane Hathaway wrote: > > > Lennart Regebro wrote: > >> On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway > >> <[EMAIL PROTECTED]> wrote: > >>> - Did you use optimal methods of retrieval in Postgres? It is > >>> frequently not necessary to pull the data into the application. > >>> Copying > >>> to another table could be faster than fetching rows. > >> But is that relevant in this case? Retrieval must reasonably really > >> retrieve the data, not just move it around. :) > > > > Not if you're only retrieving intermediate information. When you > > write an application against a relational database, a lot of the > > intermediate information does not need to be exposed to Python, > > helping performance significantly. > > yes this is the major benefit when using a relational database over > zodb, because zodb has no server side query language, This certainly makes some queries faster in an RDBMS. My first goal was to determine the speed of the most basic operation like insert and lookup. > so the whole > lookup insert comparison does not reflect real world issues. I disagree. It certainly tests some real world use cases. Most notably the use case where you have a very large user base inserting content at a very high rate into the ZODB. I think what is unknown at this stage is what the penalty would be when using ZEO. But you can only know this if you know how fast direct interaction with the ZODB is. Not all applications require ZEO either. > for example in one of our applications we have to calculate neighbours > of users, based on books they have in their bookshelfs. with about > 1 users and each of them having an average of 100-500 books out of > ca. 1 million, the calculation of the neighbours takes seconds when > you have to calculate this on the client, by getting all indexes etc. > we switched to sql and wrote a single sql statement that does exactly > the same comparison which now takes about 300ms. > > your comparisons would only be accurate if comparing relstorage with > filestorage over zeo, because in this case there is no server side > query possible on object attributes . it would be interesting to look > at performance when having 4-10 zodb clients and then compare zeo/ > filestorage against relstorage with postgres. Hopefully the tests are accurate in comparing the speed for basic operations like insertion and lookup. They might be more *relevant* if one performs the same tests using ZEO. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2008-03-04 at 13:27 -0700, Shane Hathaway wrote: > Roché Compaan wrote: > > On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote: > >> I'll update my blog post with the final stats and let you know when it > >> is ready. > >> > > > > I'll have to keep running these tests because the more I run them the > > faster the ZODB becomes ;-) Would you have guessed that the ZODB is > > faster at both insertion and lookups than Postgres? > > > > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited > > For some workloads, ZODB is definitely faster. However, I think your > analysis needs to provide more detail: I posted some of the details earlier in this thread but I'll put them up on the post as well. > - How much RAM did ZODB and Postgres consume during the tests? Don't know, will check. > - How often are you committing a transaction? 100 inserts per transaction. > - Did you use optimal methods of insertion in Postgres, such as COPY? > Also note that a standard way to insert a lot of data into a relational > database is to temporarily drop indexes and re-create them after > insertion. Your original test may be more valid than you thought. I don't think that this describes the typical interaction between an application and a database. Usually records will be added to Postgres using INSERT. A goal of the benchmarks is to understand the limitations for applications that use the ZODB and to challenge the idea that an RDBMS should be used for applications that, in naive terms, require a "big" database that can write fast. > - Did you use optimal methods of retrieval in Postgres? It is > frequently not necessary to pull the data into the application. Copying > to another table could be faster than fetching rows. I realise this. The only thing the lookup stats tell me is that lookups in the ZODB don't need drastic optimisation, it is already very fast. > - What is the ZODB cache size? How much does the speed change as you > change the cache size? A lot! With the default cache size the insert rate is a mere 50 inserts/second at around 10 million objects. I used a cache size of 10 in the benchmarks. > - How much disk space does each database consume when there are 10M objects? ZODB: 19GB How do you check the size of a Postgres database? > > > Lookups are even faster then what I originally reported. Lookup times > > averages at 2 milliseconds (0.002s) on 10 million objects. > > Maybe if you set up a ZODB cache that allows just over 10 million > objects, the lookup time will drop to microseconds. You might need a > lot of RAM to do that, though. Maybe, but somehow I think that disk IO will prevent this. I'll check. > > I think somebody else should run these tests as well and validate the > > methodology behind them, otherwise I'm spreading lies. > > You're not far from something interesting. > > On a related topic, you might be interested in the RelStorage > performance charts I just posted. Don't take them too seriously, but I > think the charts are useful. > > http://shane.willowrise.com/archives/relstorage-10-and-measurements/ Thanks for all your questions. I'll certainly post the missing detail on the web and investigate some of the things that might affect performance. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On 04.03.2008, at 22:16, Shane Hathaway wrote: Lennart Regebro wrote: On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote: - Did you use optimal methods of retrieval in Postgres? It is frequently not necessary to pull the data into the application. Copying to another table could be faster than fetching rows. But is that relevant in this case? Retrieval must reasonably really retrieve the data, not just move it around. :) Not if you're only retrieving intermediate information. When you write an application against a relational database, a lot of the intermediate information does not need to be exposed to Python, helping performance significantly. yes this is the major benefit when using a relational database over zodb, because zodb has no server side query language, so the whole lookup insert comparison does not reflect real world issues. for example in one of our applications we have to calculate neighbours of users, based on books they have in their bookshelfs. with about 1 users and each of them having an average of 100-500 books out of ca. 1 million, the calculation of the neighbours takes seconds when you have to calculate this on the client, by getting all indexes etc. we switched to sql and wrote a single sql statement that does exactly the same comparison which now takes about 300ms. your comparisons would only be accurate if comparing relstorage with filestorage over zeo, because in this case there is no server side query possible on object attributes . it would be interesting to look at performance when having 4-10 zodb clients and then compare zeo/ filestorage against relstorage with postgres. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev -- Lovely Systems, senior developer phone: +43 5572 908060, fax: +43 5572 908060-77 Schmelzhütterstraße 26a, 6850 Dornbirn, Austria skype: bernd.dorn smime.p7s Description: S/MIME cryptographic signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Lennart Regebro wrote: On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote: - Did you use optimal methods of retrieval in Postgres? It is frequently not necessary to pull the data into the application. Copying to another table could be faster than fetching rows. But is that relevant in this case? Retrieval must reasonably really retrieve the data, not just move it around. :) Not if you're only retrieving intermediate information. When you write an application against a relational database, a lot of the intermediate information does not need to be exposed to Python, helping performance significantly. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, Mar 4, 2008 at 9:27 PM, Shane Hathaway <[EMAIL PROTECTED]> wrote: > - Did you use optimal methods of insertion in Postgres, such as COPY? > Also note that a standard way to insert a lot of data into a relational > database is to temporarily drop indexes and re-create them after > insertion. Your original test may be more valid than you thought. > > - Did you use optimal methods of retrieval in Postgres? It is > frequently not necessary to pull the data into the application. Copying > to another table could be faster than fetching rows. But is that relevant in this case? Retrieval must reasonably really retrieve the data, not just move it around. :) -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote: On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote: I'll update my blog post with the final stats and let you know when it is ready. I'll have to keep running these tests because the more I run them the faster the ZODB becomes ;-) Would you have guessed that the ZODB is faster at both insertion and lookups than Postgres? http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited For some workloads, ZODB is definitely faster. However, I think your analysis needs to provide more detail: - How much RAM did ZODB and Postgres consume during the tests? - How often are you committing a transaction? - Did you use optimal methods of insertion in Postgres, such as COPY? Also note that a standard way to insert a lot of data into a relational database is to temporarily drop indexes and re-create them after insertion. Your original test may be more valid than you thought. - Did you use optimal methods of retrieval in Postgres? It is frequently not necessary to pull the data into the application. Copying to another table could be faster than fetching rows. - What is the ZODB cache size? How much does the speed change as you change the cache size? - How much disk space does each database consume when there are 10M objects? Lookups are even faster then what I originally reported. Lookup times averages at 2 milliseconds (0.002s) on 10 million objects. Maybe if you set up a ZODB cache that allows just over 10 million objects, the lookup time will drop to microseconds. You might need a lot of RAM to do that, though. I think somebody else should run these tests as well and validate the methodology behind them, otherwise I'm spreading lies. You're not far from something interesting. On a related topic, you might be interested in the RelStorage performance charts I just posted. Don't take them too seriously, but I think the charts are useful. http://shane.willowrise.com/archives/relstorage-10-and-measurements/ Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Mon, 2008-02-25 at 07:36 +0200, Roché Compaan wrote: > I'll update my blog post with the final stats and let you know when it > is ready. > I'll have to keep running these tests because the more I run them the faster the ZODB becomes ;-) Would you have guessed that the ZODB is faster at both insertion and lookups than Postgres? http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited Lookups are even faster then what I originally reported. Lookup times averages at 2 milliseconds (0.002s) on 10 million objects. I think somebody else should run these tests as well and validate the methodology behind them, otherwise I'm spreading lies. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
I made a "lovely" mistake in my first round of benchmarks. Lovely, in that it puts the ZODB in a much better light. When I first ran the Postgres test, I neglected to put an index on the key field of the table. I only added the index before I timed lookups on Postgres but forgot to retest insertion. Since the key in the ZODB is effectively indexed I think the test is only fair if the corresponding key in Postgres is indexed. Retesting insertion of a million records into Postgres with the index on the key field revealed that Postgres performance deteriorates logarithmically at roughly the same rate as the ZODB. After about 10 million insertions the ZODB was doing 250 inserts per second. After adding the index on the table, Postgres was doing only slightly better but not above 300 inserts per second. I'll update my blog post with the final stats and let you know when it is ready. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-7 21:44 +0200: > ... >There are use cases where having a container in the ZODB that can handle >large volumes and maintain a high insertion rate would be very >convenient. An example of such a use case would be a site with millions >of members where each member has their own folder containing different >content types. The rate at which new members register is very high as >well I do not believe that the insertion rate the ZODB can handle now would not be sufficient to handle this use case. I do not have your timings present, but from our installation I know that the ZODB can handle 10 transactions per second. This would mean about 36.000 per day (10 hour days) and about 1 million in a month. >so the folder needs to handle insertions quickly. In this use case >you are not dealing with structured data. If members in a site with such >large volumes start to generate content, indexes in the ZODB become >problematic too because of the slow rate of insertion. We have several write intensive applications with storages in the order of 10 to 20 GB and 10 to 20 millions objects -- and have not yet seen problems with the insertion rate. We do see other problems (notably commit contention) but these problems cannot be solved by an increased insertion rate. >And it this point >you start to stuff everything in relational database and the whole >experience becomes painful ... We speak again when you observe a concrete problem in a real installation caused by limited insertion rate. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Thu, 2008-02-07 at 20:26 +0100, Dieter Maurer wrote: > Roché Compaan wrote at 2008-2-7 21:21 +0200: > > ... > >So if I asked you to build a data structure for the ZODB that can do > >insertions at a rate comparable to Postgres on high volumes, do you > >think that it can be done? > > If you need a high write rate, the ZODB is probably not optimal. > Ask yourself whether it is not better to put such high frequency write > data directly into a relational database. > > Whenever you have large amounts of highly structured data, > a relational database is necessary more efficient than the ZODB. I know it is not optimal for high write scenarios, but I'm asking if it is possible to build a data structure for the ZODB that can do insertions quickly. There are use cases where having a container in the ZODB that can handle large volumes and maintain a high insertion rate would be very convenient. An example of such a use case would be a site with millions of members where each member has their own folder containing different content types. The rate at which new members register is very high as well so the folder needs to handle insertions quickly. In this use case you are not dealing with structured data. If members in a site with such large volumes start to generate content, indexes in the ZODB become problematic too because of the slow rate of insertion. And it this point you start to stuff everything in relational database and the whole experience becomes painful ... -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-7 21:21 +0200: > ... >So if I asked you to build a data structure for the ZODB that can do >insertions at a rate comparable to Postgres on high volumes, do you >think that it can be done? If you need a high write rate, the ZODB is probably not optimal. Ask yourself whether it is not better to put such high frequency write data directly into a relational database. Whenever you have large amounts of highly structured data, a relational database is necessary more efficient than the ZODB. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Thu, 2008-02-07 at 00:39 +0100, Dieter Maurer wrote: > >If I understand correctly, for each insertion 3 calls are made to > >"persistent_id"? This is still very far from the 66 I mentioned above? > > You did not understand correctly. > > You insert an entry. The insertion modifies (at least) one OOBucket. > The "OOBucket" needs to be written back. For each of its entries > (one is your new one, but there may be up to 29 others) 3 > "persistent_id" calls will happen. Thanks I understand now. So if I asked you to build a data structure for the ZODB that can do insertions at a rate comparable to Postgres on high volumes, do you think that it can be done? If so, would it not be worth investing time and money into this? If not, why not? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-6 20:18 +0200: >On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote: >> Roché Compaan wrote at 2008-2-4 20:54 +0200: >> > ... >> >I don't follow? There are 2 insertions and there are 1338046 calls >> >to persistent_id. Doesn't this suggest that there are 66 objects >> >persisted per insertion? This seems way to high? >> >> Jim told you that "persistent_id" is called for each object and not >> only persistent objects. >> >> An OOBucket contains up to 30 key value pairs, each of which >> are subjected to a call to "persistent_id". In each of your pairs, >> there is an additional persistent object. This means, you >> should expect 3 calls to "persistent_id" for each pair in an "OOBucket". > >If I understand correctly, for each insertion 3 calls are made to >"persistent_id"? This is still very far from the 66 I mentioned above? You did not understand correctly. You insert an entry. The insertion modifies (at least) one OOBucket. The "OOBucket" needs to be written back. For each of its entries (one is your new one, but there may be up to 29 others) 3 "persistent_id" calls will happen. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: RE : [ZODB-Dev] Re: ZODB Benchmarks
Mignon, Laurent wrote at 2008-2-6 08:06 +0100: >After a lot of tests and benchmark, my feeling is that the ZODB does not seem >suitable for systems managing many data stored in a plane hierarchy. >The application that we currently develop is a business process management >system in opposition to a content management system. In order to guarantee the >performances necessary, we decided to no more use the ZODB. All data are now >stored in a relationnal database. Roché's corrected timings indicate: The ZODB is significantly slower than Postgres for insertions but camparatively fast (slightly faster) on lookups. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2008-02-05 at 19:17 +0100, Dieter Maurer wrote: > Roché Compaan wrote at 2008-2-4 20:54 +0200: > > ... > >I don't follow? There are 2 insertions and there are 1338046 calls > >to persistent_id. Doesn't this suggest that there are 66 objects > >persisted per insertion? This seems way to high? > > Jim told you that "persistent_id" is called for each object and not > only persistent objects. > > An OOBucket contains up to 30 key value pairs, each of which > are subjected to a call to "persistent_id". In each of your pairs, > there is an additional persistent object. This means, you > should expect 3 calls to "persistent_id" for each pair in an "OOBucket". If I understand correctly, for each insertion 3 calls are made to "persistent_id"? This is still very far from the 66 I mentioned above? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-4 20:54 +0200: > ... >I don't follow? There are 2 insertions and there are 1338046 calls >to persistent_id. Doesn't this suggest that there are 66 objects >persisted per insertion? This seems way to high? Jim told you that "persistent_id" is called for each object and not only persistent objects. An OOBucket contains up to 30 key value pairs, each of which are subjected to a call to "persistent_id". In each of your pairs, there is an additional persistent object. This means, you should expect 3 calls to "persistent_id" for each pair in an "OOBucket". -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Feb 4, 2008, at 1:54 PM, Roché Compaan wrote: I don't follow? There are 2 insertions and there are 1338046 calls to persistent_id. Doesn't this suggest that there are 66 objects persisted per insertion? This seems way to high? It seems like there is some confusion about the correspondence between "persisting an object" and calls to persistent_id(). The pickler makes lots of calls to persistent_id() as it is making pickles. In my mind, "persisting an object" means saving the new state of an instance of Persistent. When you insert a new persistent instance in a BTree, you are "persisting" the new instance, the bucket/node that holds the reference to the new instance, and in some cases, some small number of other bucket/nodes that are changed as part of the insertion. That's it. If you insert a bunch of things in one commit(), the number of persistent instances committed is even smaller because some buckets will get multiple changes in one write. There are usually many calls to persistent_id() when one btree bucket is pickled, but I would count that as 1 persistent object. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Sun, 2008-02-03 at 22:05 +0100, Dieter Maurer wrote: > The number of "persitent_id" calls suggests that a written > persistent object has a mean value of 65 subobjects -- which > fits well will OOBuckets. > > However, when the profile is for commits with 100 insertions each, > then the number of written persistent objects is far too small. > In fact, we would expect about 200 persistent object writes per transaction: > the 100 new persistent objects assigned plus about as many buckets > changed by these insertions. I don't follow? There are 2 insertions and there are 1338046 calls to persistent_id. Doesn't this suggest that there are 66 objects persisted per insertion? This seems way to high? > > > >The keys that I lookup are completely random so it is probably the case > >that the lookup causes disk lookups all the time. If this is the case, > >is 230ms not still to slow? > > Unreasonably slow in fact. > > A tree with size 10**7 does likely not have a depth larger than 4 > (internal nodes should typically have at least 125 entries, leaves should have > at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * > 10**6). > Therefore, one would expect at most 4 disk accesses. > > On my (6 year old) computer, a disk access can take up to 30 ms. The lookup times I reported were wrong. There was a bug in the code that reported the lookup time - the correct average lookup time for a BTree with 10 million objects was an impressive 12 ms. For Postgres this was 14 ms. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-3 09:15 +0200: > ... >I have tried different commit intervals. The published results are for a >commit interval of 100, iow 100 inserts per commit. > >> Your profile looks very surprising: >> >> I would expect that for a single insertion, typically >> one persistent object (the bucket where the insertion takes place) >> is changed. About every 15 inserts, 3 objects are changed (the bucket >> is split) about every 15*125 inserts, 5 objects are changed >> (split of bucket and its container). >> But the mean value of objects changed in a transaction is 20 >> in your profile. >> The changed objects typically have about 65 subobjects. This >> fits with "OOBucket"s. > >It was very surprising to me too since the insertion is so basic. I >simply assign a Persistent object with 1 string attribute that is 1K in >size to a key in a OOBTree. I mentioned this earlier on the list and I >thought that Jim's explanation was sufficient when he said that the >persistent_id method is called for all objects including simple types >like strings, ints, etc. I don't know if it explains all the calls that >add up to a mean value of 20 though. I guess the calls are being made by >the cPickle module, but I don't have the experience to investigate this. The number of "persitent_id" calls suggests that a written persistent object has a mean value of 65 subobjects -- which fits well will OOBuckets. However, when the profile is for commits with 100 insertions each, then the number of written persistent objects is far too small. In fact, we would expect about 200 persistent object writes per transaction: the 100 new persistent objects assigned plus about as many buckets changed by these insertions. > >The keys that I lookup are completely random so it is probably the case >that the lookup causes disk lookups all the time. If this is the case, >is 230ms not still to slow? Unreasonably slow in fact. A tree with size 10**7 does likely not have a depth larger than 4 (internal nodes should typically have at least 125 entries, leaves should have at least 15 -- a tree of depth 4 thus can have about 125**3*15 = 29.x * 10**6). Therefore, one would expect at most 4 disk accesses. On my (6 year old) computer, a disk access can take up to 30 ms. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote: > Roché Compaan wrote at 2008-2-1 21:17 +0200: > >I have completed my first round of benchmarks on the ZODB and welcome > >any criticism and advise. I summarised our earlier discussion and > >additional findings in this blog entry: > >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks > > In your insertion test: when do you do commits? > One per insertion? Or one per n insertions (for which "n")? I have tried different commit intervals. The published results are for a commit interval of 100, iow 100 inserts per commit. > Your profile looks very surprising: > > I would expect that for a single insertion, typically > one persistent object (the bucket where the insertion takes place) > is changed. About every 15 inserts, 3 objects are changed (the bucket > is split) about every 15*125 inserts, 5 objects are changed > (split of bucket and its container). > But the mean value of objects changed in a transaction is 20 > in your profile. > The changed objects typically have about 65 subobjects. This > fits with "OOBucket"s. It was very surprising to me too since the insertion is so basic. I simply assign a Persistent object with 1 string attribute that is 1K in size to a key in a OOBTree. I mentioned this earlier on the list and I thought that Jim's explanation was sufficient when he said that the persistent_id method is called for all objects including simple types like strings, ints, etc. I don't know if it explains all the calls that add up to a mean value of 20 though. I guess the calls are being made by the cPickle module, but I don't have the experience to investigate this. > Lookup times: > > 0.23 s would be 230 ms not 23 ms. Oops my multiplier broke ;-) > > The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the > BTree implementation itself. Lookup time is proportional to > the tree depth, which ideally would be O(log(n)). While BTrees > are not necessarily balanced (and therefore the depth may be larger > than logarithmic) it is not easy to obtain a severely unbalanced > tree by insertions only. > Other factors must have contributed to this drop: swapping, cache too small, > garbage collections... The cache size was set to 10 objects so I doubt that this was the cause. I do the lookup test right after I populate the BTree so it might be that the cache and memory is full but I take care to commit after the BTree is populated so even this is unlikely. The keys that I lookup are completely random so it is probably the case that the lookup causes disk lookups all the time. If this is the case, is 230ms not still to slow? > Furthermore, the lookup times for your smaller BTrees are far too > good -- fetching any object from disk takes in the order of several > ms (2 to 20, depending on your disk). > This means that the lookups for your smaller BTrees have > typically been served directly from the cache (no disk lookups). > With your large BTree disk lookups probably became necessary. I accept that these lookups all all served from cache. I am going to modify the lookup test so that I close the database after population and re-open it when starting the test to make sure nothing is cached and see what the results look like. Thanks for your insightful comments! -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Roché Compaan wrote at 2008-2-1 21:17 +0200: >I have completed my first round of benchmarks on the ZODB and welcome >any criticism and advise. I summarised our earlier discussion and >additional findings in this blog entry: >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks In your insertion test: when do you do commits? One per insertion? Or one per n insertions (for which "n")? Your profile looks very surprising: I would expect that for a single insertion, typically one persistent object (the bucket where the insertion takes place) is changed. About every 15 inserts, 3 objects are changed (the bucket is split) about every 15*125 inserts, 5 objects are changed (split of bucket and its container). But the mean value of objects changed in a transaction is 20 in your profile. The changed objects typically have about 65 subobjects. This fits with "OOBucket"s. Lookup times: 0.23 s would be 230 ms not 23 ms. The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the BTree implementation itself. Lookup time is proportional to the tree depth, which ideally would be O(log(n)). While BTrees are not necessarily balanced (and therefore the depth may be larger than logarithmic) it is not easy to obtain a severely unbalanced tree by insertions only. Other factors must have contributed to this drop: swapping, cache too small, garbage collections... Furthermore, the lookup times for your smaller BTrees are far too good -- fetching any object from disk takes in the order of several ms (2 to 20, depending on your disk). This means that the lookups for your smaller BTrees have typically been served directly from the cache (no disk lookups). With your large BTree disk lookups probably became necessary. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
I have completed my first round of benchmarks on the ZODB and welcome any criticism and advise. I summarised our earlier discussion and additional findings in this blog entry: http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
On Dec 7, 2007, at 11:19 AM, Godefroid Chapelle wrote: Jim Fulton wrote: On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote: Jim Fulton wrote: It sounds like I should write some pickle and cPickle tests and we should update the ZODB trunk to take advantage of this. (/me fears gettimg mired in Python 3.) Jim Would you do that on Python 2.4, 2.5 or ... ? I would do it on "?". ;) I'd help with pleasure... I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle hasn't been ported to Python 3 yet. :) This is just a fairly simple test, so shouldn't be a big deal. ... but I fear it would take you longer to explain what to do than to do it. Maybe. I bet all that's needed is to reuse the existing persistent_id tests using inst_persiistent_id. I haven't looked at the tests yet. At which point, one might a test to distinguish the 2 cases. And, of course, eventually, the tests will need to get checked in. I can do that. You might be able to write the tests without much input from me. OTOH, I'll get to them eventually. :) Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Jim Fulton wrote: On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote: Jim Fulton wrote: It sounds like I should write some pickle and cPickle tests and we should update the ZODB trunk to take advantage of this. (/me fears gettimg mired in Python 3.) Jim Would you do that on Python 2.4, 2.5 or ... ? I would do it on "?". ;) I'd help with pleasure... I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle hasn't been ported to Python 3 yet. :) This is just a fairly simple test, so shouldn't be a big deal. ... but I fear it would take you longer to explain what to do than to do it. Jim -- Jim Fulton Zope Corporation -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
On Dec 7, 2007, at 10:55 AM, Godefroid Chapelle wrote: Jim Fulton wrote: It sounds like I should write some pickle and cPickle tests and we should update the ZODB trunk to take advantage of this. (/me fears gettimg mired in Python 3.) Jim Would you do that on Python 2.4, 2.5 or ... ? I would do it on "?". ;) I guess I'd need to do it for 2.4, 2.5, trunk and hope that pickle hasn't been ported to Python 3 yet. :) This is just a fairly simple test, so shouldn't be a big deal. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Jim Fulton wrote: It sounds like I should write some pickle and cPickle tests and we should update the ZODB trunk to take advantage of this. (/me fears gettimg mired in Python 3.) Jim Would you do that on Python 2.4, 2.5 or ... ? -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
[I'm resending this mail since it seems it never reached the mailing list?] On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote: > On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: > > > Jim Fulton wrote: > >> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: > Despite this change there are still a huge amount > of unexplained calls to the 'persistent_id' method of the > ObjectWriter > in serialize.py. > >>> > >>> Why 'unexplained'? 'persistent_id' is called from the Pickler > >>> instance > >>> being used in ObjectWriter._dump(). It is called for each and every > >>> single object reachable from the main object, due to the way Pickler > >>> works (I believe). Maybe persistent_id can be analysed and optimized > >>> for the most common cases? > >> Yup. > >> Note that there is a undocumented feature in cPickle that I added > >> years ago to deal with this issue but never got around to > >> pursuing. Maybe someone else would be able to spend the time to > >> try it out and report back. > >> If you set inst_persistent_id, rather than persistent_id, on a > >> pickler, then the hook will only be called for instances. This > >> should eliminate that vast majority of the calls. > >> Note that this feature was added back when testing was minimal or > >> non-existent, so it is untested, however, the implementation is > >> simple enough. :) > > > > Do you mean that the ZODB has enough tests now that making the > > change and running the tests might already be a good proof ? > > No, I mean that pickle and cPickle lack tests for this feature. > > > Or should we be more prudent ? > > It would be nice to try this out with ZODB to see if it makes much > difference. If it does, then that would provide extra motivation for > me to add the missing test. > > Roché Compaan said he would try it out, but I just realized that he > might have been waiting for me. Sorry for not responding earlier. I actually tried this out immediately after you suggested it and was very impressed with the improvement in performance. I have been meaning to write back to give a thorough report but a project with insane deadlines caught up with me. This project ends this week so I plan to continue with my benchmark test next week and give feedback thereafter. The amount of calls to persistent_id dropped dramatically and in a test of 10 million inserts the insert rate almost doubled from 1000 thousand to 2000 inserts per second for at least the first million inserts. The insert rate decreases rapidly thereafter until it drops to the insert rate recorded before the persistent_id change. I guess at this point the overhead of bucket splits are two high. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
It sounds like I should write some pickle and cPickle tests and we should update the ZODB trunk to take advantage of this. (/me fears gettimg mired in Python 3.) Jim On Dec 7, 2007, at 4:23 AM, Godefroid Chapelle wrote: Godefroid Chapelle wrote: Jim Fulton wrote: On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :) Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ? No, I mean that pickle and cPickle lack tests for this feature. Or should we be more prudent ? It would be nice to try this out with ZODB to see if it makes much difference. If it does, then that would provide extra motivation for me to add the missing test. Roché Compaan said he would try it out, but I just realized that he might have been waiting for me. Laurent (cced) tried it today and it seems it does make a difference. Our benchmark is running this night with bigger amount of content. We will be back with results tomorrow. We can measure some benefit. For tests on a ZODB prefilled with 100k instances of an archetypes class, update of an instance : 12% improval insert of an instance : 15% improval If it would, What do you mean by 'If it would' ? If we can measure a benefit. Jim -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Godefroid Chapelle wrote: Jim Fulton wrote: On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :) Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ? No, I mean that pickle and cPickle lack tests for this feature. Or should we be more prudent ? It would be nice to try this out with ZODB to see if it makes much difference. If it does, then that would provide extra motivation for me to add the missing test. Roché Compaan said he would try it out, but I just realized that he might have been waiting for me. Laurent (cced) tried it today and it seems it does make a difference. Our benchmark is running this night with bigger amount of content. We will be back with results tomorrow. We can measure some benefit. For tests on a ZODB prefilled with 100k instances of an archetypes class, update of an instance : 12% improval insert of an instance : 15% improval If it would, What do you mean by 'If it would' ? If we can measure a benefit. Jim -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Jim Fulton wrote: On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :) Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ? No, I mean that pickle and cPickle lack tests for this feature. Or should we be more prudent ? It would be nice to try this out with ZODB to see if it makes much difference. If it does, then that would provide extra motivation for me to add the missing test. Roché Compaan said he would try it out, but I just realized that he might have been waiting for me. Laurent (cced) tried it today and it seems it does make a difference. Our benchmark is running this night with bigger amount of content. We will be back with results tomorrow. If it would, What do you mean by 'If it would' ? If we can measure a benefit. Jim -- Jim Fulton Zope Corporation -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote: > On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: > > > Jim Fulton wrote: > >> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: > Despite this change there are still a huge amount > of unexplained calls to the 'persistent_id' method of the > ObjectWriter > in serialize.py. > >>> > >>> Why 'unexplained'? 'persistent_id' is called from the Pickler > >>> instance > >>> being used in ObjectWriter._dump(). It is called for each and every > >>> single object reachable from the main object, due to the way Pickler > >>> works (I believe). Maybe persistent_id can be analysed and optimized > >>> for the most common cases? > >> Yup. > >> Note that there is a undocumented feature in cPickle that I added > >> years ago to deal with this issue but never got around to > >> pursuing. Maybe someone else would be able to spend the time to > >> try it out and report back. > >> If you set inst_persistent_id, rather than persistent_id, on a > >> pickler, then the hook will only be called for instances. This > >> should eliminate that vast majority of the calls. > >> Note that this feature was added back when testing was minimal or > >> non-existent, so it is untested, however, the implementation is > >> simple enough. :) > > > > Do you mean that the ZODB has enough tests now that making the > > change and running the tests might already be a good proof ? > > No, I mean that pickle and cPickle lack tests for this feature. > > > Or should we be more prudent ? > > It would be nice to try this out with ZODB to see if it makes much > difference. If it does, then that would provide extra motivation for > me to add the missing test. > > Roché Compaan said he would try it out, but I just realized that he > might have been waiting for me. Sorry for not responding earlier. I actually tried this out immediately after you suggested it and was very impressed with the improvement in performance. I have been meaning to write back to give a thorough report but a project with insane deadlines caught up with me. This project ends this week so I plan to continue with my benchmark test next week and give feedback thereafter. The amount of calls to persistent_id dropped dramatically and in a test of 10 million inserts the insert rate almost doubled from 1000 thousand to 2000 inserts per second for at least the first million inserts. The insert rate decreases rapidly thereafter until it drops to the insert rate recorded before the persistent_id change. I guess at this point the overhead of bucket splits are two high. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote: Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :) Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ? No, I mean that pickle and cPickle lack tests for this feature. Or should we be more prudent ? It would be nice to try this out with ZODB to see if it makes much difference. If it does, then that would provide extra motivation for me to add the missing test. Roché Compaan said he would try it out, but I just realized that he might have been waiting for me. If it would, What do you mean by 'If it would' ? If we can measure a benefit. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 6, 2007, at 3:17 PM, Roché Compaan wrote: On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. So is this as simple as modifying the following code in the ObjectWriter: self._p.persistent_id = self.persistent_id to: self._p.inst_persistent_id = self.persistent_id Yes. I'll give it a go as part of my benchmarks that I'm running and report back. I hope you weren't waiting for my answer. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Jim Fulton wrote: On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :) Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ? Or should we be more prudent ? If it would, What do you mean by 'If it would' ? then of course we should contribute documentation and a test to the Python source tree. Jim -- Jim Fulton Zope Corporation -- Godefroid Chapelle (aka __gotcha) http://bubblenet.be ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 6, 2007, at 2:30 PM, Roché Compaan wrote: If you can cut the b-node branching factor in half, I bet your benchmark will run almost twice as fast. I increased the 'DEFAULT_MAX_BUCKET_SIZE' from 30 to 3 and DEFAULT_MAX_BTREE_SIZE from 250 to 2500 and it didn't make any noticeable difference. Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. I can see here that what I wrote was not clear. I intended to say that the *smaller* bucket size should be faster for this because the size of each committed transaction would be smaller. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, Nov 06, 2007 at 10:01:24PM +0200, Roché Compaan wrote: > On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote: > > > Despite this change there are still a huge amount > > > of unexplained calls to the 'persistent_id' method of the ObjectWriter > > > in serialize.py. > > > > Why 'unexplained'? 'persistent_id' is called from the Pickler instance > > being used in ObjectWriter._dump(). It is called for each and every > > single object reachable from the main object, due to the way Pickler > > works (I believe). Maybe persistent_id can be analysed and optimized > > for the most common cases? > > > > If you look at the profiler stats I posted earlier you would have > noticed that there was about 1.3 million calls to persistent_id while > only 2 objects were persisted. So if it is being called for each > object I would expect a figure closer to 2, not 1.3 million. What am > I missing? AFAIU persisted_id() is called once for every reference to a persisten object rather than once for every persistent object. If there were 65 references to each of the 2 objects you'd get 1.3 million calls to persistent_id(). Marius Gedminas -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet? signature.asc Description: Digital signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2007-11-06 at 14:51 -0500, Jim Fulton wrote: > On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: > > >> Despite this change there are still a huge amount > >> of unexplained calls to the 'persistent_id' method of the > >> ObjectWriter > >> in serialize.py. > > > > Why 'unexplained'? 'persistent_id' is called from the Pickler instance > > being used in ObjectWriter._dump(). It is called for each and every > > single object reachable from the main object, due to the way Pickler > > works (I believe). Maybe persistent_id can be analysed and optimized > > for the most common cases? > > Yup. > > Note that there is a undocumented feature in cPickle that I added > years ago to deal with this issue but never got around to pursuing. > Maybe someone else would be able to spend the time to try it out and > report back. > > If you set inst_persistent_id, rather than persistent_id, on a > pickler, then the hook will only be called for instances. This > should eliminate that vast majority of the calls. So is this as simple as modifying the following code in the ObjectWriter: self._p.persistent_id = self.persistent_id to: self._p.inst_persistent_id = self.persistent_id I'll give it a go as part of my benchmarks that I'm running and report back. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2007-11-06 at 15:08 -0500, Jim Fulton wrote: > On Nov 6, 2007, at 3:01 PM, Roché Compaan wrote: > > > On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote: > >>> Despite this change there are still a huge amount > >>> of unexplained calls to the 'persistent_id' method of the > >>> ObjectWriter > >>> in serialize.py. > >> > >> Why 'unexplained'? 'persistent_id' is called from the Pickler > >> instance > >> being used in ObjectWriter._dump(). It is called for each and every > >> single object reachable from the main object, due to the way Pickler > >> works (I believe). Maybe persistent_id can be analysed and optimized > >> for the most common cases? > >> > > > > If you look at the profiler stats I posted earlier you would have > > noticed that there was about 1.3 million calls to persistent_id while > > only 2 objects were persisted. So if it is being called for each > > object I would expect a figure closer to 2, not 1.3 million. > > What am > > I missing? > > It's called for *all* objects, not just persistent objects. This > includes, ints, strings (including attribute names), etc. Ah. Man that lightbulb is burning my brain ;-) -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 6, 2007, at 3:01 PM, Roché Compaan wrote: On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? If you look at the profiler stats I posted earlier you would have noticed that there was about 1.3 million calls to persistent_id while only 2 objects were persisted. So if it is being called for each object I would expect a figure closer to 2, not 1.3 million. What am I missing? It's called for *all* objects, not just persistent objects. This includes, ints, strings (including attribute names), etc. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Tue, 2007-11-06 at 17:40 -0200, Sidnei da Silva wrote: > > Despite this change there are still a huge amount > > of unexplained calls to the 'persistent_id' method of the ObjectWriter > > in serialize.py. > > Why 'unexplained'? 'persistent_id' is called from the Pickler instance > being used in ObjectWriter._dump(). It is called for each and every > single object reachable from the main object, due to the way Pickler > works (I believe). Maybe persistent_id can be analysed and optimized > for the most common cases? > If you look at the profiler stats I posted earlier you would have noticed that there was about 1.3 million calls to persistent_id while only 2 objects were persisted. So if it is being called for each object I would expect a figure closer to 2, not 1.3 million. What am I missing? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote: Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? Yup. Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non- existent, so it is untested, however, the implementation is simple enough. :) If it would, then of course we should contribute documentation and a test to the Python source tree. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
> Despite this change there are still a huge amount > of unexplained calls to the 'persistent_id' method of the ObjectWriter > in serialize.py. Why 'unexplained'? 'persistent_id' is called from the Pickler instance being used in ObjectWriter._dump(). It is called for each and every single object reachable from the main object, due to the way Pickler works (I believe). Maybe persistent_id can be analysed and optimized for the most common cases? -- Sidnei da Silva Enfold Systemshttp://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Wed, 2007-10-31 at 10:47 -0400, David Binger wrote: > On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote: > > > Thanks for the explanation. > > The actual insertion is very fast. Your benchmark is dominated by > the time to serialize the changes due to an insertion. > > You should usually have just 2 instances to serialize per insertion of > a new instance: the instance itself and the b-node that points to the > instance. An insertion may also cause changes in 2 or several b-nodes, > but those cases are less likely. > > Serializing your simple instances is probably fast, but serializing > the b-nodes appears to be taking much more time, and probably accounts > for the large number of calls to persistent_id. B-Nodes with higher > branching factors will have more parts to serialize and they will be > slower. If you can cut the b-node branching factor in half, I bet your > benchmark will run almost twice as fast. I increased the 'DEFAULT_MAX_BUCKET_SIZE' from 30 to 3 and DEFAULT_MAX_BTREE_SIZE from 250 to 2500 and it didn't make any noticeable difference. Despite this change there are still a huge amount of unexplained calls to the 'persistent_id' method of the ObjectWriter in serialize.py. Running fsdump on the Data.fs confirmed that there are only 2 instances serialized per insertion, one OOBucket and an instance of the persistent class used for testing. So thus far the only tweak that made a significant difference was increasing the DB cache size from 400 to 10. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Laurence Rowe wrote: Essentially you end up with a solution very similar to QueueCatalog but with the queue being searchable. The pain is then in modifying all of the indexes to search the queue in addition to their standard data structures. In many applications it is acceptable to have a catalog (or other data structure) that is slightly out of date. In those, you can ignore the queued data. -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Fri, 2007-11-02 at 16:00 +, Laurence Rowe wrote: > Matt Hamilton wrote: > > David Binger mems-exchange.org> writes: > > > >> > >> On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote: > >> > >>> Lots of people don't do nightly packs, I'm pretty sure such a process > >>> needs to be completely automatic. The question is weather doing it in > >>> a separate process in the background, or ever X transactions, or every > >>> X seconds, or something. > >> Okay, perhaps the trigger should be the depth of the small-bucket tree. > > > > That may just end up causing delays periodically in transactions... ie > > delays > > that the user sees, as opposed to doing it via another thread or something. > > But > > then as only one thread would be doing this at a time it might not be too > > bad. > > > > -Matt > > ClockServer sections can now be specified in zope.conf. If you specify > them with a period of say 10 mins (or even 2) then the queue should > never get too large, and the linear search time is not a problem as n is > small. > > Essentially you end up with a solution very similar to QueueCatalog but > with the queue being searchable. > > The pain is then in modifying all of the indexes to search the queue in > addition to their standard data structures. I don't think that you need to modify the indexes at all. You simply pass the query arguments to the queue in the exact same way that you apply the arguments to individual indexes. I think with a little enhancement to QueueCatalog one should be able to pull this off. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Matt Hamilton wrote: David Binger mems-exchange.org> writes: On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote: Lots of people don't do nightly packs, I'm pretty sure such a process needs to be completely automatic. The question is weather doing it in a separate process in the background, or ever X transactions, or every X seconds, or something. Okay, perhaps the trigger should be the depth of the small-bucket tree. That may just end up causing delays periodically in transactions... ie delays that the user sees, as opposed to doing it via another thread or something. But then as only one thread would be doing this at a time it might not be too bad. -Matt ClockServer sections can now be specified in zope.conf. If you specify them with a period of say 10 mins (or even 2) then the queue should never get too large, and the linear search time is not a problem as n is small. Essentially you end up with a solution very similar to QueueCatalog but with the queue being searchable. The pain is then in modifying all of the indexes to search the queue in addition to their standard data structures. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
This is the 'batch' or 'distribute' pattern that crops up in many fields. The best path is normally to understand what the conflicts are, and where the time is spent. If in, this case, much time is spent in the preamble, and the actual inserts are quick, then diving down one time through the security layers and stuffing in 10 items is clearly better than 10 preambles, one for each insert. The other truism is that all optimisation is for a single case. There may be different answers for different cases. Ideally a single parameter would be enough to tune the system for different cases. Good luck Roche, with the outcome, I'm excited to see some figures. --r. On 2 Nov 2007, at 15:24, David Binger wrote: On Nov 2, 2007, at 10:58 AM, Lennart Regebro wrote: It seems to me having one thread doing a background consolidation one transaction at a time seems a better way to go, Maybe, but maybe that just causes big buckets to get invalidated in all of the clients over and over again, when we could accomplish the same objective in one invalidation by waiting longer and executing a bigger consolidation. although certainly the best thing would be to test all kinds of solutions and see. No doubt about that. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev Russ Ferriday [EMAIL PROTECTED] office: +44 118 3217026 mobile: +44 7789 338868 skype: ferriday ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 2, 2007, at 10:58 AM, Lennart Regebro wrote: It seems to me having one thread doing a background consolidation one transaction at a time seems a better way to go, Maybe, but maybe that just causes big buckets to get invalidated in all of the clients over and over again, when we could accomplish the same objective in one invalidation by waiting longer and executing a bigger consolidation. although certainly the best thing would be to test all kinds of solutions and see. No doubt about that. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On 11/2/07, David Binger <[EMAIL PROTECTED]> wrote: > > But wouldn't then all other threads get a conflict? > > If they are trying to do insertions at the same time as the > consolidation, yes. > This data structure won't stop insertion conflicts, the intent is to > make them > less frequent. But still, that does mean that in practice all writing threads will stand still during consolidation, because if they do anything they will get a conflict. And this whole issue only arises if you have loads of conflicts, almost all the time, because you have many writes. It seems to me having one thread doing a background consolidation one transaction at a time seems a better way to go, although certainly the best thing would be to test all kinds of solutions and see. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 2, 2007, at 10:18 AM, Christian Theune wrote: Wouldn't a queue be a good data structure to do that? IIRC ZC already wrote a queue that doesn't conflict: http://svn.zope.de/zope.org/zc.queue/trunk/src/zc/queue/queue.txt If you store key/value pairs in the queue, you can do a step-by-step migration from the queue to the btree. I guess that was the original proposal mentioned by Matt. The bad thing about a Queue for this purpose is that searches are linear time, so you can't wait as long before consolidating. That might be okay, though, if you intend to run consolidations continuously. Probably this should be encapsulated into a new data structure that looks btree-like and has an additional `consolidate` method. That sounds proper. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Hi, Am Freitag, den 02.11.2007, 09:56 -0400 schrieb David Binger: > On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote: > > > On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote: > >> That may just end up causing delays periodically in > >> transactions... ie delays > >> that the user sees, as opposed to doing it via another thread or > >> something. But > >> then as only one thread would be doing this at a time it might not > >> be too bad. > > > > But wouldn't then all other threads get a conflict? > > If they are trying to do insertions at the same time as the > consolidation, yes. > This data structure won't stop insertion conflicts, the intent is to > make them > less frequent. Hmm. Wouldn't a queue be a good data structure to do that? IIRC ZC already wrote a queue that doesn't conflict: http://svn.zope.de/zope.org/zc.queue/trunk/src/zc/queue/queue.txt If you store key/value pairs in the queue, you can do a step-by-step migration from the queue to the btree. Probably this should be encapsulated into a new data structure that looks btree-like and has an additional `consolidate` method. Calling the `consolidate` method would have to happen from the application that uses this data structure. Two issues I can think of immediately: - General: We need an efficient way to find all data structures that need reconciliation, maybe a ZODB-wide index of all objects that require reconciliation would be nice. - With Zope and ZEO: which Zope server is responsible for actually performing the reconciliation? One of the Zope servers that is marked in the zope.conf? Or maybe the ZEO server? Christian -- gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development signature.asc Description: Dies ist ein digital signierter Nachrichtenteil ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 2, 2007, at 8:39 AM, Lennart Regebro wrote: On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote: That may just end up causing delays periodically in transactions... ie delays that the user sees, as opposed to doing it via another thread or something. But then as only one thread would be doing this at a time it might not be too bad. But wouldn't then all other threads get a conflict? If they are trying to do insertions at the same time as the consolidation, yes. This data structure won't stop insertion conflicts, the intent is to make them less frequent. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On 11/2/07, Matt Hamilton <[EMAIL PROTECTED]> wrote: > That may just end up causing delays periodically in transactions... ie delays > that the user sees, as opposed to doing it via another thread or something. > But > then as only one thread would be doing this at a time it might not be too bad. But wouldn't then all other threads get a conflict? -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote: Lots of people don't do nightly packs, I'm pretty sure such a process needs to be completely automatic. The question is weather doing it in a separate process in the background, or ever X transactions, or every X seconds, or something. Okay, perhaps the trigger should be the depth of the small-bucket tree. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
David Binger mems-exchange.org> writes: > > > On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote: > > > Lots of people don't do nightly packs, I'm pretty sure such a process > > needs to be completely automatic. The question is weather doing it in > > a separate process in the background, or ever X transactions, or every > > X seconds, or something. > > Okay, perhaps the trigger should be the depth of the small-bucket tree. That may just end up causing delays periodically in transactions... ie delays that the user sees, as opposed to doing it via another thread or something. But then as only one thread would be doing this at a time it might not be too bad. -Matt ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On 11/2/07, David Binger <[EMAIL PROTECTED]> wrote: > I think that option would work. I think it would suffice to do a > "Big.update(Small); Small.clear()" operation before a nightly pack. Lots of people don't do nightly packs, I'm pretty sure such a process needs to be completely automatic. The question is weather doing it in a separate process in the background, or ever X transactions, or every X seconds, or something. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 2, 2007, at 5:48 AM, Lennart Regebro wrote: On 11/1/07, Matt Hamilton <[EMAIL PROTECTED]> wrote: An interesting idea. Surely we need the opposite though, and that is an additional BTree with a very large bucket size, as we want to minimize the chance of a bucket split when inserting? Then we occasionally consolidate and move the items in the original BTree with the regular bucket size/ branch factor. Would it be possible to not "occasionally" consolidate, but actually do it ongoing, but just one process, thereby always inserting just one transaction into the normal BTree at a time? Or does that cause troubles? I think that option would work. I think it would suffice to do a "Big.update(Small); Small.clear()" operation before a nightly pack. It might invalidate every bucket in every cache, but BTrees are designed to perform reasonably without a cache. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On 11/1/07, Matt Hamilton <[EMAIL PROTECTED]> wrote: > An interesting idea. Surely we need the opposite though, and that is an > additional BTree with a very large bucket size, as we want to minimize the > chance of a bucket split when inserting? Then we occasionally consolidate and > move the items in the original BTree with the regular bucket size/branch > factor. Would it be possible to not "occasionally" consolidate, but actually do it ongoing, but just one process, thereby always inserting just one transaction into the normal BTree at a time? Or does that cause troubles? //In way over my head. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
Quick note... Smaller buckets, fewer conflicts, more overhead on reading and writing. Larger buckets, more conflicts, less overhead on reading and writing. One bucket ... constant conflicts. I'd bet that the additional tree with tiny buckets would be best. Transfer them into the normal tree once the overhead starts rising. How to transfer them? You would not want a single transaction to take the hit for the whole transfer, so have a low and high water mark. When hitting HWM, transfer only until LWM is reached. Or, just focus on transferring some items out of a single tree, to avoid the cost of tree rebalancing on the additional tree at the same time as rebalancing on the main tree. Sounds like a fin project, --r. On 1 Nov 2007, at 21:00, David Binger wrote: On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote: David Binger mems-exchange.org> writes: On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote: Ie we perhaps look at a catalog data structure in which writes are initially done to some kind of queue then moved to the BTrees at a later point. A suggestion: use a pair of BTrees, one with a high branching factor (bucket size) and one with a very low branching factor. Force all writes into the tree with little buckets. Make every search look in both trees. Consolidate occasionally. An interesting idea. Surely we need the opposite though, and that is an additional BTree with a very large bucket size, as we want to minimize the chance of a bucket split when inserting? Then we occasionally consolidate and move the items in the original BTree with the regular bucket size/ branch factor. You may be right about that. Conflict resolution makes it harder for me to predict which way is better. If you don't have conflict resolution for insertions, then I think the smaller buckets are definitely better for avoiding conflicts. In either case, smaller buckets reduce the size and serialization time of the insertion transactions, and that alone *might* be a reason to favor them. I think I'd still bet on smaller buckets, but tests would expose the trade-offs. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev Russ Ferriday [EMAIL PROTECTED] office: +44 118 3217026 mobile: +44 7789 338868 skype: ferriday ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote: David Binger mems-exchange.org> writes: On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote: Ie we perhaps look at a catalog data structure in which writes are initially done to some kind of queue then moved to the BTrees at a later point. A suggestion: use a pair of BTrees, one with a high branching factor (bucket size) and one with a very low branching factor. Force all writes into the tree with little buckets. Make every search look in both trees. Consolidate occasionally. An interesting idea. Surely we need the opposite though, and that is an additional BTree with a very large bucket size, as we want to minimize the chance of a bucket split when inserting? Then we occasionally consolidate and move the items in the original BTree with the regular bucket size/ branch factor. You may be right about that. Conflict resolution makes it harder for me to predict which way is better. If you don't have conflict resolution for insertions, then I think the smaller buckets are definitely better for avoiding conflicts. In either case, smaller buckets reduce the size and serialization time of the insertion transactions, and that alone *might* be a reason to favor them. I think I'd still bet on smaller buckets, but tests would expose the trade-offs. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 1, 2007, at 4:25 PM, Matt Hamilton wrote: David Binger mems-exchange.org> writes: On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote: Ie we perhaps look at a catalog data structure in which writes are initially done to some kind of queue then moved to the BTrees at a later point. A suggestion: use a pair of BTrees, one with a high branching factor (bucket size) and one with a very low branching factor. Force all writes into the tree with little buckets. Make every search look in both trees. Consolidate occasionally. An interesting idea. Surely we need the opposite though, and that is an additional BTree with a very large bucket size, as we want to minimize the chance of a bucket split when inserting? Then we occasionally consolidate and move the items in the original BTree with the regular bucket size/ branch factor. maybe. haven't thought it through, but worth thinking about. idle thought I should probably not share: you could use a Bucket directly for that--it will never split at all, and has the conflict resolution behavior. (strangely, I'm not idle at all, but rather overwhelmingly busy ;-) ) Gary ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
David Binger mems-exchange.org> writes: > > > On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote: > > > Ie we perhaps look at a catalog data structure > > in which writes are initially done to some kind of queue then moved > > to the > > BTrees at a later point. > > A suggestion: use a pair of BTrees, one with a high branching factor > (bucket size) > and one with a very low branching factor. Force all writes into the > tree with little > buckets. Make every search look in both trees. Consolidate > occasionally. An interesting idea. Surely we need the opposite though, and that is an additional BTree with a very large bucket size, as we want to minimize the chance of a bucket split when inserting? Then we occasionally consolidate and move the items in the original BTree with the regular bucket size/branch factor. -Matt ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Wed, 2007-10-31 at 10:47 -0400, David Binger wrote: > On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote: > > > Thanks for the explanation. > > The actual insertion is very fast. Your benchmark is dominated by > the time to serialize the changes due to an insertion. > > You should usually have just 2 instances to serialize per insertion of > a new instance: the instance itself and the b-node that points to the > instance. An insertion may also cause changes in 2 or several b-nodes, > but those cases are less likely. > > Serializing your simple instances is probably fast, but serializing > the b-nodes appears to be taking much more time, and probably accounts > for the large number of calls to persistent_id. B-Nodes with higher > branching factors will have more parts to serialize and they will be > slower. If you can cut the b-node branching factor in half, I bet your > benchmark will run almost twice as fast. I guess you are referring to the B-Tree bucket size? This is not really configurable and one will have to recompile the C code once you modify it. For OOBTree the max bucket size is 30. I'll see what effect it has on the test nevertheless. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Nov 1, 2007, at 7:05 AM, Matt Hamilton wrote: Ie we perhaps look at a catalog data structure in which writes are initially done to some kind of queue then moved to the BTrees at a later point. A suggestion: use a pair of BTrees, one with a high branching factor (bucket size) and one with a very low branching factor. Force all writes into the tree with little buckets. Make every search look in both trees. Consolidate occasionally. It seems like this would be a good way to push the conflict rate down while still providing fast item access when the cache is cold. The little bucket tree seems better than an ordinary queue for this purpose because it can be searched faster, and you can let it grow as much as you like between consolidations. Also the size of the transaction on each insert will be pretty small. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Laurence Rowe lrowe.co.uk> writes: > So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for > inserts. Instead of inserting into the (B-Tree based) data files at > every transaction commit it writes a record to the WAL. This does not > require traversal of the B-Tree and has O(1) time complexity. The > penalty for this is that read operations become more complex, they must > look first in the WAL and overlay those results with the main index. The > WAL is never allowed to get too large, or its in memory index would > become too big. This is sort of what I proposed at the performance BOF at the Plone Conf specifically for the ZCatalog. Ie we perhaps look at a catalog data structure in which writes are initially done to some kind of queue then moved to the BTrees at a later point. One thing to be vary wary of with all of this. As has been shown the conflict errors are relative to the size of the Btree, this is due to the probability of a bucket needing to be split. We need to be very sure that real life use cases are what we think they are. Ie. in a large running site some of the BTrees will already be quite large and so conflicts might not be such an issue (they are an issue so something is not right here). Or we might find for instance that one of the catalog indexes, eg. something like a FieldIndex might only have a small vocabulary (e.g. the author of a piece of content) but referenced by every document. In that case you would have an index with very few keys and N values (where N is the number of documents) and an unindex of N keys each with a a very small number of values, hence a small btree, hence large chance of collisions. -Matt ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Oct 31, 2007, at 7:35 AM, Roché Compaan wrote: Thanks for the explanation. The actual insertion is very fast. Your benchmark is dominated by the time to serialize the changes due to an insertion. You should usually have just 2 instances to serialize per insertion of a new instance: the instance itself and the b-node that points to the instance. An insertion may also cause changes in 2 or several b-nodes, but those cases are less likely. Serializing your simple instances is probably fast, but serializing the b-nodes appears to be taking much more time, and probably accounts for the large number of calls to persistent_id. B-Nodes with higher branching factors will have more parts to serialize and they will be slower. If you can cut the b-node branching factor in half, I bet your benchmark will run almost twice as fast. I think the default branching factor is large in ZODB because that can be good for fast reading. It is bad if you want fast writing. If you want fast writing, use a small branching factor. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
On Wed, 2007-10-31 at 10:00 +, Laurence Rowe wrote: > It looks like ZODB performance in your test has the same O(log n) > performance as PostgreSQL checkpoints (the periodic drops in your > graph). This should come as no surprise. B-Trees have a theoretical > Search/Insert/Delete time complexity equal to the height of the tree, > which is (up to) log(n). > > So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for > inserts. Instead of inserting into the (B-Tree based) data files at > every transaction commit it writes a record to the WAL. This does not > require traversal of the B-Tree and has O(1) time complexity. The > penalty for this is that read operations become more complex, they must > look first in the WAL and overlay those results with the main index. The > WAL is never allowed to get too large, or its in memory index would > become too big. Thanks for the explanation. After some profiling I noticed that there are millions of OID lookups in the index. Increasing the cache size from 400 to 10 led to more acceptable levels of performance degradation. I'll post some results later on. Some profiling also showed that there are huge amount of calls to the persistent_id method of the ObjectWriter - persisting 1 objects leads to 1338046 calls to persistent_id. This seems to have quite a bit of overhead. Profile results attached. > If you are going to have this number of records -- in a single B-Tree -- > then use a relational database. It's what they're optimised for. The point of the benchmark is to determine what "this number of records" means and to deduce best practice when working with the ZODB. I would much rather tell a developer to use multiple B-Trees if he wants to store this number of records than tell them to use a relational database. Telling a ZODB programmer to use a relational database is an insult ;-) One of the tests that I want to try out next is to insert records concurrently into different B-Trees. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za Tue Oct 30 20:28:04 2007/tmp/profile-1.dat 6108977 function calls (6108973 primitive calls) in 57.280 CPU seconds Ordered by: cumulative time List reduced from 232 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 10.0000.000 57.280 57.280 profile_zodb.py:70(run) 10.0000.000 57.280 57.280 :1(?) 10.2600.260 57.280 57.280 profile_zodb.py:24(_btrees_insert) 10.0000.000 57.280 57.280 profile:0(run()) 10010.0300.000 51.0600.051 _manager.py:88(commit) 10010.0400.000 50.9900.051 _transaction.py:365(commit) 10010.1100.000 50.7300.051 _transaction.py:486(_commitResources) 10010.0200.000 48.0600.048 Connection.py:496(commit) 10010.2200.000 48.0400.048 Connection.py:512(_commit) 98890.9400.000 47.3400.005 Connection.py:561(_store_objects) 203720.4800.000 39.7900.002 serialize.py:381(serialize) 203720.5000.000 38.9500.002 serialize.py:409(_dump) 407507.7900.000 38.0200.001 :0(dump) 1338046 17.5600.000 30.2300.000 serialize.py:184(persistent_id) 21772239.1500.0009.1500.000 :0(isinstance) 203731.5500.0005.2400.000 FileStorage.py:631(store) 29640.0500.0004.9800.002 Connection.py:749(setstate) 29640.1000.0004.9300.002 Connection.py:769(_setstate) 29640.0800.0004.1800.001 serialize.py:603(setGhostState) 29640.0300.0004.1000.001 serialize.py:593(getState) ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: ZODB Benchmarks
I think someone proposed to have something just like a WAL in ZODB. That could be an interesting optimization. -- Sidnei da Silva Enfold Systemshttp://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
It looks like ZODB performance in your test has the same O(log n) performance as PostgreSQL checkpoints (the periodic drops in your graph). This should come as no surprise. B-Trees have a theoretical Search/Insert/Delete time complexity equal to the height of the tree, which is (up to) log(n). So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for inserts. Instead of inserting into the (B-Tree based) data files at every transaction commit it writes a record to the WAL. This does not require traversal of the B-Tree and has O(1) time complexity. The penalty for this is that read operations become more complex, they must look first in the WAL and overlay those results with the main index. The WAL is never allowed to get too large, or its in memory index would become too big. If you are going to have this number of records -- in a single B-Tree -- then use a relational database. It's what they're optimised for. Laurence Roché Compaan wrote: Well I finally realised that ZODB benchmarks are not going to fall from the sky so compelled by a project that needs to scale to very large numbers and a general desire to have real numbers I started to write some benchmarks. My first goal was to get a baseline and test performance for the most basic operations like inserts and lookups. The first test tests BTree performance (OOBTree to be specific) and insert instances of a persitent class into a BTree. Each instance has a single attribute that is 1K in size. The test tries out different commit intervals - the first iteration commits every 10 inserts, the second iteration commits every 100 inserts and the last one commits every 1000 inserts. I don't have results for the second and third iterations since the first iteration takes a couple of hours to complete and I'm still waiting for the results on the second and third iteration. The results so far is worrying in that performance deteriorates logarithmically. The test kicks of with a bang at close to 750 inserts per second, but after 1 million objects the insert rate drops to 260 inserts per second and at 10 million objects the rate is not even 60 inserts per second. Why? In an attempt to determine if this drop in performance is normal I created a test with Postgres purely to observe transaction rate and not to compare it with the ZODB. In Postgres the transaction rate hovers around 2700 inserts throughout the test. There are periodic drops but I guess these are times when Postgres flushes to disc. I was hoping to have a consistent transaction rate in the ZODB too. See the attached image for the comparison. I also attach csv files of the data collected by both tests. During the last Plone conference I started a project called zodbbench available here: https://svn.plone.org/svn/collective/collective.zodbbench The tests are written as unit tests and are run with a testrunner script. The project uses buildout to make it easy to get going. Unfortunately installing it with buildout on some systems seems to lead to weird import errors that I can't explain so I would appreciate it if somebody with buildout fu can look at it. What I would appreciate more though is an explanation of the drop in performance or alternatively, why the test is insane ;-) ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev