Re: [ZODB-Dev] Making ZODB / ZEO faster
On Mon, Dec 7, 2009 at 1:46 PM, Erik Dahl wrote: > Biggest DBs we see are in the 10GB range. Most are significantly smaller > (100s of MB) so I don't think size is an issue. We typically run big > persistent caches so I don't think there is much help to be had there. It > would only make reads faster anyway right? Yes, but reads typically dominate performance -- of course each application is different. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Biggest DBs we see are in the 10GB range. Most are significantly smaller (100s of MB) so I don't think size is an issue. We typically run big persistent caches so I don't think there is much help to be had there. It would only make reads faster anyway right? -EAD On Dec 7, 2009, at 11:21 AM, Jim Fulton wrote: > On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl wrote: >> Guys, >> >> Thanks for all the great feedback. Still processing it but here are >> somethings we will try. >> >> We will also >> look at tuning out current ZEO setup. Last time I looked there was >> only the invalidation queue. I poked around a bit for tuning docs >> and >> didn't see any. Can someone point me to them? > > The biggest problem ZODB has in general is docs. > > I'd consider tuning your ZEO client caches and using > persistent caches. > > I'm still curious how big your database is. > >> Our slow loading object was a persistent with a regular list inside >> of >> the main pickle. Objects that the list pointed to were persistent >> which I believe means that hey will load separately. > > Yes. > > Jim > > -- > Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl wrote: > Guys, > > Thanks for all the great feedback. Still processing it but here are > somethings we will try. > > We will also > look at tuning out current ZEO setup. Last time I looked there was > only the invalidation queue. I poked around a bit for tuning docs and > didn't see any. Can someone point me to them? The biggest problem ZODB has in general is docs. I'd consider tuning your ZEO client caches and using persistent caches. I'm still curious how big your database is. > Our slow loading object was a persistent with a regular list inside of > the main pickle. Objects that the list pointed to were persistent > which I believe means that hey will load separately. Yes. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Guys, Thanks for all the great feedback. Still processing it but here are somethings we will try. RelStorage - in our app context to see if there it helps / hurts. will report back results. Quick tests show some improvement. We will also look at tuning out current ZEO setup. Last time I looked there was only the invalidation queue. I poked around a bit for tuning docs and didn't see any. Can someone point me to them? zc.catalogqueue - we are adding fullish indexing to our objects (they aren't text pages so not truly a "full" index). hopefully moving indexing out to a separate process will keep the impact of the new index low and help with our current conflict issues. Our slow loading object was a persistent with a regular list inside of the main pickle. Objects that the list pointed to were persistent which I believe means that hey will load separately. In general we have tried to make our persistent objects reasonably large to lower the amount of load round trips. I haven't actually checked its size yet but it will be interesting to see. -EAD On Dec 4, 2009, at 6:21 PM, Shane Hathaway wrote: > Jim Fulton wrote: >> I find this a bit confusing. For the warm numbers, It looks like >> ZEO didn't >> utilize a persistent cache, which explains why the ZEO numbers are >> the >> same for hot and cold. Is that right? > > Yes. It is currently difficult to set up ZEO caches, which I > consider an issue with this early version of zodbshootout. > zodbshootout does include a sample test configuration that turns on > a ZEO cache, but it's not possible to run that configuration with a > concurrency level greater than 1. > >> What poll interval are you using for relstorage in the tests? >> Assuming an application gets reasonable cache hit rates, I don't >> see any >> meaningful difference between ZEO and relstorage in these numbers. > > You are entitled to your opinion. :-) Personally, I have observed a > huge improvement for many operations. > Second, does the test still write and then read roughly the same amount of data as before? >>> That is a command line option. The chart on the web page shows >>> reading and >>> writing 1000 small persistent objects per transaction, >> Which is why I consider this benchmark largely moot. The database is >> small enough >> to fit in the servers disk cache. Even the slowest access times are >> on the order of .5 >> milliseconds. Disk accesses are typically measured in 10s of >> milliseconds. With magnetic >> disks, for databases substantially larger than the server's ram, the >> network component >> of loading objects will be noise compared to the disk access. > > That's why I think solid state disk is already a major win, > economically, for large ZODB setups. The FusionIO cards in > particular are likely to be at least as reliable as any disk. It's > time to change the way we think about seek time. > > Shane > ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Jim Fulton wrote: > I find this a bit confusing. For the warm numbers, It looks like ZEO didn't > utilize a persistent cache, which explains why the ZEO numbers are the > same for hot and cold. Is that right? Yes. It is currently difficult to set up ZEO caches, which I consider an issue with this early version of zodbshootout. zodbshootout does include a sample test configuration that turns on a ZEO cache, but it's not possible to run that configuration with a concurrency level greater than 1. > What poll interval are you using for relstorage in the tests? > > Assuming an application gets reasonable cache hit rates, I don't see any > meaningful difference between ZEO and relstorage in these numbers. You are entitled to your opinion. :-) Personally, I have observed a huge improvement for many operations. >>> Second, does the test still write and then read roughly the same >>> amount of data as before? >> That is a command line option. The chart on the web page shows reading and >> writing 1000 small persistent objects per transaction, > > Which is why I consider this benchmark largely moot. The database is > small enough > to fit in the servers disk cache. Even the slowest access times are > on the order of .5 > milliseconds. Disk accesses are typically measured in 10s of > milliseconds. With magnetic > disks, for databases substantially larger than the server's ram, the > network component > of loading objects will be noise compared to the disk access. That's why I think solid state disk is already a major win, economically, for large ZODB setups. The FusionIO cards in particular are likely to be at least as reliable as any disk. It's time to change the way we think about seek time. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway wrote: > Jim Fulton wrote: >> >> On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway >> wrote: >>> >>> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/ >> >> I won't take the time now to analyze the new test, although I will ask >> a couple of questions: >> >> First, in your results, you show cold, warm and hot numbers. Do these >> correspond to my cold, hot and steamin numbers? > > zodbshootout still produces steamin numbers, but I didn't include them on > the web page because they would dominate the chart. I added the "warm" test > after your speedtest modifications. My cold, hot, and steamin numbers > correspond with your cold, hot, and steamin numbers. The steamin numbers are > about the same for RelStorage and ZEO. See this page for a chart that > includes steamin numbers: > > http://pypi.python.org/pypi/zodbshootout#interpreting-the-results I find this a bit confusing. For the warm numbers, It looks like ZEO didn't utilize a persistent cache, which explains why the ZEO numbers are the same for hot and cold. Is that right? What poll interval are you using for relstorage in the tests? Assuming an application gets reasonable cache hit rates, I don't see any meaningful difference between ZEO and relstorage in these numbers. >> Second, does the test still write and then read roughly the same >> amount of data as before? > > That is a command line option. The chart on the web page shows reading and > writing 1000 small persistent objects per transaction, Which is why I consider this benchmark largely moot. The database is small enough to fit in the servers disk cache. Even the slowest access times are on the order of .5 milliseconds. Disk accesses are typically measured in 10s of milliseconds. With magnetic disks, for databases substantially larger than the server's ram, the network component of loading objects will be noise compared to the disk access. The only real difference between relstorage and ZEO is in cold numbers for databases in ram. This measures the raw low-level server network performance. MySQL and Postgress servers are obviously faster than ZEO at this level, but networking is a small component of the overall work load. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway wrote: > Jim Fulton wrote: >> >> On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway >> wrote: >>> >>> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/ >> >> I won't take the time now to analyze the new test, although I will ask >> a couple of questions: >> >> First, in your results, you show cold, warm and hot numbers. Do these >> correspond to my cold, hot and steamin numbers? > > zodbshootout still produces steamin numbers, but I didn't include them on > the web page because they would dominate the chart. I added the "warm" test > after your speedtest modifications. My cold, hot, and steamin numbers > correspond with your cold, hot, and steamin numbers. The steamin numbers are > about the same for RelStorage and ZEO. See this page for a chart that > includes steamin numbers: > > http://pypi.python.org/pypi/zodbshootout#interpreting-the-results I find this a bit confusing. For the warm numbers, It looks like ZEO didn't utilize a persistent cache, which explains why the ZEO numbers are the same for hot and cold. Is that right? What poll interval are you using for relstorage in the tests? Assuming an application gets reasonable cache hit rates, I don't see any meaningful difference between ZEO and relstorage in these numbers. >> Second, does the test still write and then read roughly the same >> amount of data as before? > > That is a command line option. The chart on the web page shows reading and > writing 1000 small persistent objects per transaction, Which is why I consider this benchmark largely moot. The database is small enough to fit in the servers disk cache. Even the slowest access times are on the order of .5 milliseconds. Disk accesses are typically measured in 10s of milliseconds. With magnetic disks, for databases substantially larger than the server's ram, the network component of loading objects will be noise compared to the disk access. The only real difference between relstorage and ZEO is in cold numbers for databases in ram. This measures the raw low-level server network performance. MySQL and Postgress servers are obviously faster than ZEO at this level, but networking is a small component of the overall work load. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Jim Fulton wrote: > On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway wrote: >> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/ > > I won't take the time now to analyze the new test, although I will ask > a couple of questions: > > First, in your results, you show cold, warm and hot numbers. Do these > correspond to my cold, hot and steamin numbers? zodbshootout still produces steamin numbers, but I didn't include them on the web page because they would dominate the chart. I added the "warm" test after your speedtest modifications. My cold, hot, and steamin numbers correspond with your cold, hot, and steamin numbers. The steamin numbers are about the same for RelStorage and ZEO. See this page for a chart that includes steamin numbers: http://pypi.python.org/pypi/zodbshootout#interpreting-the-results > Second, does the test still write and then read roughly the same > amount of data as before? That is a command line option. The chart on the web page shows reading and writing 1000 small persistent objects per transaction, and the object count is corrected now. (OTOH, someone could claim the object count is off by one for some of the tests, and if that turns out significant, I can correct that.) Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Erik Dahl wrote: > I haven't dove into the relstorage yet but have heard it will perform > better. Not sure I understand why though. Isn't it just putting > pickles into a single table with an index on the oid? (or oid / serial). Yes. In theory, ZEO should be about the same speed, but all the measurements I've done suggest ZEO has excessively high internal latency. I suspect ZEO uses Python sockets inefficiently, or that the GIL may be getting in the way. I know FileStorage isn't the problem, since bare FileStorage can read or write about 20,000 objects per second on my laptop. Jim is working on a newer version of ZEO, so maybe that will close the performance gap. For now, all the people who have told me they are using RelStorage seem to be happy with the improved speed. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway wrote: > Jim Fulton wrote: >> >> On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl wrote: >> ... >>> >>> I haven't dove into the relstorage yet but have heard it will perform >>> better. >> >> It doesn't. See: >> >> https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html > > Since you posted those numbers, I have worked hard to solve the performance > issues you identified. I have improved both RelStorage and the performance > test, which is now a separate project (released on PyPI) named zodbshootout. > Your numbers may apply to RelStorage 1.2, but not 1.4. > > See: > > http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/ I won't take the time now to analyze the new test, although I will ask a couple of questions: First, in your results, you show cold, warm and hot numbers. Do these correspond to my cold, hot and steamin numbers? Second, does the test still write and then read roughly the same amount of data as before? Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Jim Fulton wrote: > On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl wrote: > ... >> I haven't dove into the relstorage yet but have heard it will perform >> better. > > It doesn't. See: > > https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html Since you posted those numbers, I have worked hard to solve the performance issues you identified. I have improved both RelStorage and the performance test, which is now a separate project (released on PyPI) named zodbshootout. Your numbers may apply to RelStorage 1.2, but not 1.4. See: http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/ Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Jim Fulton wrote: > On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl wrote: > ... >> I haven't dove into the relstorage yet but have heard it will perform >> better. > > It doesn't. See: > > https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html Jim, That's a little sweeping and some might see you as a little biased in this ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl wrote: > Guys, > > We have a product written in python using ZODB/ZEO and I would like to > improve the speed of database in general. Things that I have seen > that I would like to improve some I understand and some not. How have you "seen" these? Do you have evidence that they are affecting your applications specifically? > 1. Loading of largish (but not too large object had a list with around > 20K references in it) pickles can be very slow. Ok what size pickle? > not 100% sure is there a way to get the pickle size of a zodb > persistent object? I lamely tried to pickle one of our persistent > objects and of course it blew up with max recursion because it went > beyond the normal bounds of a zodb persistent pickle. There must be a > way to do this though right? You can get the pickle and thus the pickle size for an object fairly straightforwardly. p, s = ob._p_jar.db().storage.load(ob._p_oid, '') print len(p) Pickling is actually pretty fast. By list, do you mean a Python list (or persistent list)? When such an object is loaded, the database will have to instantiate all those objects. That might be time consuming. In general, I'd avoid large lists or persistent lists, especially if they are uploaded a lot. If you try to access those objects all at once, then those objects will have to be loaded from the database, and that can be very time consuming. > 2. Writing lots of objects. I know that zodb wasn't written for this > type of use case but we have backed into it. While some relational databases have greater transaction throughput, you can fair writes throughput with ZODB. > We can have many (~30 is > that a lot?) That's on the order of what we have. > zodb clients and as a result large numbers of cache > invalidations can be sent when a write occurs. Could invalidation > performance / cache refresh be an issue? I don't think so. Invalidations aren't that large and are sent asynchronously. > 3. DB hot spots. Of course we see conflict errors when there are lots > of writes to the db from different clients that touch the same > object. We haven't done a bunch of optimization work here but I'm > thinking of moving all indexing out to a separate client/process that > reads off a queue to find objects to index. I'm guessing the indexes > are a hotspot (haven't tested this out much though I guess b-tree's > buckets should alleviate this problem some). (is there a persistent > queue around?) hotspots are an issue in any database. It is generally worth application refactoring to avoid them. :) > Anyway these are some things that come to mind when I think of > performance issues. I have the thought that many could be made better > with faster ZEO I/O. Does this seem like a good assumption? If so > what could we do to make ZEO faster? There are 2 major issues with ZEO that I'm aware of: - Each object load requires a round trip. If you have a list of 20,000 objects and you want to iterate over them, that will require 20,000 separate object load requests, each requiring a network round trip. We've discussed schemes to give the database hints to pre-fetch objects, thus avoiding serial round trips. - ZEO servers are currently single threaded. This can hurt a lot of you have many clients and prevents you from taking advantage of multi- spindle storage systems. I'll note that if you have a large database, a substantial amount of time can be spent doing disk IO. In a recent test we performed with a 500GB database, disk access accounted for ~15 milliseconds of a ~16 millisecond object load requests. IOW, disk access times swamped network access times, which in my measurements were on the order of a 600-1000 microseconds. > Questions: > > * We use a filestorage are there faster ones? Not that I'm aware of. > Can this be a bottleneck? Not in and of itself AFAIK. Your disk configuration can matter a lot. How big is your database file? The biggest issues with file storage for large databases are: - packing can take a long time and can affect database performance a great deal. - shut down and start up can take a long time to write and read an index file. God help you if you have to open a large database without an up to date index. This is why I am (yet again) exploring a BerkeleyDB-based storage implementation. > * Is the ZEO protocol inefficient? No, or at least not enough to matter. > * Is the ZEO server just plain slow? To some degree, yes, mainly because it is single threaded. See my jim-thready-zeo2 branch. I suspect that Python's GIL will always put ZEO at a disadvantage relative to n You can help by testing this. :) > Thoughts I have that may have no impact. > > * rewrite ZEO or parts of it in C > * write a C based storage I don't think either of these will help in any significant way. Jim -- Jim Fulton ___ For more information about ZODB, see the Z
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl wrote: > Guys, > > We have a product written in python using ZODB/ZEO and I would like to > improve the speed of database in general. Things that I have seen > that I would like to improve some I understand and some not. How have you "seen" these? Do you have evidence that they are affecting your applications specifically? > 1. Loading of largish (but not too large object had a list with around > 20K references in it) pickles can be very slow. Ok what size pickle? > not 100% sure is there a way to get the pickle size of a zodb > persistent object? I lamely tried to pickle one of our persistent > objects and of course it blew up with max recursion because it went > beyond the normal bounds of a zodb persistent pickle. There must be a > way to do this though right? You can get the pickle and thus the pickle size for an object fairly straightforwardly. p, s = ob._p_jar.db().storage.load(ob._p_oid, '') print len(p) Pickling is actually pretty fast. By list, do you mean a Python list (or persistent list)? When such an object is loaded, the database will have to instantiate all those objects. That might be time consuming. In general, I'd avoid large lists or persistent lists, especially if they are uploaded a lot. If you try to access those objects all at once, then those objects will have to be loaded from the database, and that can be very time consuming. > 2. Writing lots of objects. I know that zodb wasn't written for this > type of use case but we have backed into it. While some relational databases have greater transaction throughput, you can fair writes throughput with ZODB. > We can have many (~30 is > that a lot?) That's on the order of what we have. > zodb clients and as a result large numbers of cache > invalidations can be sent when a write occurs. Could invalidation > performance / cache refresh be an issue? I don't think so. Invalidations aren't that large and are sent asynchronously. > 3. DB hot spots. Of course we see conflict errors when there are lots > of writes to the db from different clients that touch the same > object. We haven't done a bunch of optimization work here but I'm > thinking of moving all indexing out to a separate client/process that > reads off a queue to find objects to index. I'm guessing the indexes > are a hotspot (haven't tested this out much though I guess b-tree's > buckets should alleviate this problem some). (is there a persistent > queue around?) hotspots are an issue in any database. It is generally worth application refactoring to avoid them. :) > Anyway these are some things that come to mind when I think of > performance issues. I have the thought that many could be made better > with faster ZEO I/O. Does this seem like a good assumption? If so > what could we do to make ZEO faster? There are 2 major issues with ZEO that I'm aware of: - Each object load requires a round trip. If you have a list of 20,000 objects and you want to iterate over them, that will require 20,000 separate object load requests, each requiring a network round trip. We've discussed schemes to give the database hints to pre-fetch objects, thus avoiding serial round trips. - ZEO servers are currently single threaded. This can hurt a lot of you have many clients and prevents you from taking advantage of multi- spindle storage systems. I'll note that if you have a large database, a substantial amount of time can be spent doing disk IO. In a recent test we performed with a 500GB database, disk access accounted for ~15 milliseconds of a ~16 millisecond object load requests. IOW, disk access times swamped network access times, which in my measurements were on the order of a 600-1000 microseconds. > Questions: > > * We use a filestorage are there faster ones? Not that I'm aware of. > Can this be a bottleneck? Not in and of itself AFAIK. Your disk configuration can matter a lot. How big is your database file? The biggest issues with file storage for large databases are: - packing can take a long time and can affect database performance a great deal. - shut down and start up can take a long time to write and read an index file. God help you if you have to open a large database without an up to date index. This is why I am (yet again) exploring a BerkeleyDB-based storage implementation. > * Is the ZEO protocol inefficient? No, or at least not enough to matter. > * Is the ZEO server just plain slow? To some degree, yes, mainly because it is single threaded. See my jim-thready-zeo2 branch. I suspect that Python's GIL will always put ZEO at a disadvantage relative to non-Python-based database servers. You can help by testing this. :) > Thoughts I have that may have no impact. > > * rewrite ZEO or parts of it in C > * write a C based storage I don't think either of these will help in any significant way. Jim -- Jim Fulton ___ For more
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl wrote: ... > I haven't dove into the relstorage yet but have heard it will perform > better. It doesn't. See: https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
Right I have looked at the nosql stuff a good bit. MongoDB is interesting but no transactions and their sharding is alpha code. ChouchDB has transactions but no sharding at all. HBase has what I want but I can't get my brain around how to use the wacky schema. Unfortunately, this isn't a new project so if we could get more out of what we have that would be great but clearly swapping out the back end may become necessary. I haven't dove into the relstorage yet but have heard it will perform better. Not sure I understand why though. Isn't it just putting pickles into a single table with an index on the oid? (or oid / serial). -EAD On Dec 4, 2009, at 10:15 AM, Andreas Jung wrote: > The project - at least enterprise-level projects - requires are > careful > choice of the tools and backends. The ZODB is the golden bullet for > all > and everything. Depending on the data model and the project needs you > have to > look at relational database or NOSQL databases as alternatives. > > And as you wrote: the ZODB-based application perform bad in heavy- > write > scenarios...you may raise the limits by using Relstorage but perhaps > other backends might be the better choice - something one must > consider > when starting a new project. > > -aj > > Am 04.12.09 15:41, schrieb Erik Dahl: >> Guys, >> >> We have a product written in python using ZODB/ZEO and I would like >> to >> improve the speed of database in general. Things that I have seen >> that I would like to improve some I understand and some not. >> >> 1. Loading of largish (but not too large object had a list with >> around >> 20K references in it) pickles can be very slow. Ok what size pickle? >> not 100% sure is there a way to get the pickle size of a zodb >> persistent object? I lamely tried to pickle one of our persistent >> objects and of course it blew up with max recursion because it went >> beyond the normal bounds of a zodb persistent pickle. There must >> be a >> way to do this though right? >> >> 2. Writing lots of objects. I know that zodb wasn't written for this >> type of use case but we have backed into it. We can have many (~30 >> is >> that a lot?) zodb clients and as a result large numbers of cache >> invalidations can be sent when a write occurs. Could invalidation >> performance / cache refresh be an issue? >> >> 3. DB hot spots. Of course we see conflict errors when there are >> lots >> of writes to the db from different clients that touch the same >> object. We haven't done a bunch of optimization work here but I'm >> thinking of moving all indexing out to a separate client/process that >> reads off a queue to find objects to index. I'm guessing the indexes >> are a hotspot (haven't tested this out much though I guess b-tree's >> buckets should alleviate this problem some). (is there a persistent >> queue around?) >> >> Anyway these are some things that come to mind when I think of >> performance issues. I have the thought that many could be made >> better >> with faster ZEO I/O. Does this seem like a good assumption? If so >> what could we do to make ZEO faster? >> >> Questions: >> >> * We use a filestorage are there faster ones? Can this be a >> bottleneck? >> * Is the ZEO protocol inefficient? >> * Is the ZEO server just plain slow? >> >> Thoughts I have that may have no impact. >> >> * rewrite ZEO or parts of it in C >> * write a C based storage >> >> Others? >> >> -EAD >> >> >> >> ___ >> For more information about ZODB, see the ZODB Wiki: >> http://www.zope.org/Wikis/ZODB/ >> >> ZODB-Dev mailing list - ZODB-Dev@zope.org >> https://mail.zope.org/mailman/listinfo/zodb-dev >> > > > -- > ZOPYX Ltd. & Co KG \ zopyx group > Charlottenstr. 37/1 \ The full-service network for your > D-72070 Tübingen \ Python, Zope and Plone projects > www.zopyx.com, i...@zopyx.com \ www.zopyxgroup.com > > E-Publishing, Python, Zope & Plone development, Consulting > > > ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
The project - at least enterprise-level projects - requires are careful choice of the tools and backends. The ZODB is the golden bullet for all and everything. Depending on the data model and the project needs you have to look at relational database or NOSQL databases as alternatives. And as you wrote: the ZODB-based application perform bad in heavy-write scenarios...you may raise the limits by using Relstorage but perhaps other backends might be the better choice - something one must consider when starting a new project. -aj Am 04.12.09 15:41, schrieb Erik Dahl: > Guys, > > We have a product written in python using ZODB/ZEO and I would like to > improve the speed of database in general. Things that I have seen > that I would like to improve some I understand and some not. > > 1. Loading of largish (but not too large object had a list with around > 20K references in it) pickles can be very slow. Ok what size pickle? > not 100% sure is there a way to get the pickle size of a zodb > persistent object? I lamely tried to pickle one of our persistent > objects and of course it blew up with max recursion because it went > beyond the normal bounds of a zodb persistent pickle. There must be a > way to do this though right? > > 2. Writing lots of objects. I know that zodb wasn't written for this > type of use case but we have backed into it. We can have many (~30 is > that a lot?) zodb clients and as a result large numbers of cache > invalidations can be sent when a write occurs. Could invalidation > performance / cache refresh be an issue? > > 3. DB hot spots. Of course we see conflict errors when there are lots > of writes to the db from different clients that touch the same > object. We haven't done a bunch of optimization work here but I'm > thinking of moving all indexing out to a separate client/process that > reads off a queue to find objects to index. I'm guessing the indexes > are a hotspot (haven't tested this out much though I guess b-tree's > buckets should alleviate this problem some). (is there a persistent > queue around?) > > Anyway these are some things that come to mind when I think of > performance issues. I have the thought that many could be made better > with faster ZEO I/O. Does this seem like a good assumption? If so > what could we do to make ZEO faster? > > Questions: > > * We use a filestorage are there faster ones? Can this be a bottleneck? > * Is the ZEO protocol inefficient? > * Is the ZEO server just plain slow? > > Thoughts I have that may have no impact. > > * rewrite ZEO or parts of it in C > * write a C based storage > > Others? > > -EAD > > > > ___ > For more information about ZODB, see the ZODB Wiki: > http://www.zope.org/Wikis/ZODB/ > > ZODB-Dev mailing list - ZODB-Dev@zope.org > https://mail.zope.org/mailman/listinfo/zodb-dev > -- ZOPYX Ltd. & Co KG \ zopyx group Charlottenstr. 37/1 \ The full-service network for your D-72070 Tübingen \ Python, Zope and Plone projects www.zopyx.com, i...@zopyx.com \ www.zopyxgroup.com E-Publishing, Python, Zope & Plone development, Consulting <>___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Making ZODB / ZEO faster
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl wrote: > I'm guessing the indexes > are a hotspot (haven't tested this out much though I guess b-tree's > buckets should alleviate this problem some). (is there a persistent > queue around?) Cataloging certainly can be a hot spot. Check out zc.catalogqueue. -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev