Re: [ZODB-Dev] How to check for setting the same values on persistent objects?
On Thu, May 5, 2011 at 6:27 PM, Alexandre Garel alex.ga...@tarentis.com wrote: I'm assuming doing a general check for old == new is not safe, as it might not be implemented correctly for all objects and doing the comparison might be expensive. I know very few of ZODB internals but in Python old == new does not means old is new Sure, but we aren't interested in object identity here. We want to know if something close to cPickle.dumps(old_data, 1) == cPickle.dumps(new_data, 1), for which old_data == new_data is an approximation, but likely not correct in all cases. Checking for identity would only work for ints, interned strings a very few other things. I don't know the way ZODB retrieve a particular object exactly but I assume it does this using _p_oid. So for persistant classes you could check old._p_oid == new._p_oid. For string, int you can of course use old is new. The _p_oid of the object stays the same, it's the data it represents that might change. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
Hello. Is your storage server CPU bound? load average: 1.47, 1.34, 1.20 mpstat: 11:40:13 AM CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 11:40:13 AM all5.820.000.691.110.010.120.00 92.25548.52 I guess it's not very high, for an 8 core machine. Starting with ZODB 3.10, ZEO storage servers are multi-threaded. They have a thread for each client. We have a storage server that has run at 120% cpu on a 4-core box. Also, if you use zc.FileStorage, packing is mostly done in a separate process. I didn't know this package even existed. I will give it a try, but in any case our problems don't seem to be related to packing. A major source of slow down *can* be disk access times. How's IO wait on your server? As you can see from the mpstat snapshot above, just around 1%. I have checked iostat as well, and the number of transactions per second seems to be very low considering the maximum allowed by the hardware. Look at how your application is using data. If you have requests that have to load a lot of data, maybe you can refactor your application to load fewer. A quick check with nethogs shows values of network usage oscillating between 100 and 200 KB/s. But I guess that if I were loading an excessive amount of data, this value would be higher, no? Thanks a lot for your answers. Pedro -- José Pedro Ferreira Software Developer, Indico Project http://indico-software.org +---+ + '``'--- `+ CERN - European Organization for Nuclear Research + |CERN| / + 1211 Geneve 23, Switzerland + ..__. \. + IT-UDS-AVC + \\___.\ + Office: 513-R-0042 + /+ Tel. +41227677159 +---+ ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
Hello, Query fewer objects from the database. Make sure you don't store lots of tiny persistent objects in the database, I'd aim for storing data in chunks of 8-32kb or use blobs for larger objects. Remember that ZODB is a key/value storage for the most part. Model your data accordingly. That's hard to do for a project that is already 8 or 9 years old, as you can see in the attached file, we've got have many cases that fall outside your limits. I've noticed, for instance, that pages that involve the loading of 200 MaKaC.review.Abstract objects have an awful performance record (maybe because we then load for each object a handful of other referenced persitent objects). Before you do that, you might consider switching to RelStorage, which already has a memcached caching layer in addition to the connection caches. But isn't RelStorage supposed be slower than FileStorage/ZEO? But remember that throwing more caches at the problem isn't a solution. It's likely the way you store or query the data from the database that's not optimal. I agree, many things could be improved regarding the data structures we use. However, it is also true that we have a large number of objects that are rarely changed, and that there is no need to fetch from the DB if we can keep them in memory. The ZEO server needs almost no CPU power, except for garbage collection and packing. During normal operations the CPU speed should be irrelevant. Yes, it is consistent with our load average history. It slightly increases during DB packing, but otherwise stays around 1. Thanks a lot for your answers. Cheers, Pedro -- José Pedro Ferreira Software Developer, Indico Project http://indico-software.org +---+ + '``'--- `+ CERN - European Organization for Nuclear Research + |CERN| / + 1211 Geneve 23, Switzerland + ..__. \. + IT-UDS-AVC + \\___.\ + Office: 513-R-0042 + /+ Tel. +41227677159 +---+ Module.ClassName Percentage Min MaxSize Instances __ MaKaC.webinterface.displayMgr.SystemLink | 13.7365089336% | 165B | 551B | 1753.1Mb | 7108834 MaKaC.conference.Contribution | 7.33041346279% | 954B |12.4Kb | 935.5Mb |719579 MaKaC.accessControl.AccessController | 5.33470736765% | 159B | 1.7Kb | 680.8Mb | 2699638 MaKaC.conference.Conference| 4.28939314997% | 1.7Kb |75.6Kb | 547.4Mb |165507 MaKaC.common.log.ActionLogItem | 4.17241315243% | 215B | 9.9Kb | 532.5Mb | 1298923 BTrees.OOBTree.OOBucket| 3.35268430987% | 51B | 1.2Mb | 427.9Mb |237004 MaKaC.common.Counter.Counter | 3.23618867689% | 59B | 70B | 413.0Mb | 7339189 MaKaC.conference.LocalFile | 2.95354653864% | 337B | 7.7Kb | 376.9Mb |994222 MaKaC.registration.PersonalDataFormItem| 1.99248960658% | 116B | 162B | 254.3Mb | 1951216 MaKaC.conference.ContributionParticipation | 1.52190679944% | 184B | 1.9Kb | 194.2Mb |739925 MaKaC.common.indexes.EmailIndex| 1.3809146% | 2.1Mb | 2.1Mb | 174.4Mb |92 MaKaC.registration.RegistrationForm| 1.30462205201% | 824B | 6.1Kb | 166.5Mb |162642 MaKaC.conference.Slides| 1.13426433984% | 291B | 1.8Kb | 144.8Mb |386587 persistent.mapping.PersistentMapping | 1.12364719117% | 57B |12.4Kb | 143.4Mb |847891 MaKaC.review.AbstractMgr | 1.10508750233% | 729B | 6.6Kb | 141.0Mb |162620 MaKaC.conference.ContribStatusSch | 0.903904932477% | 170B | 195B | 115.4Mb |648183 MaKaC.common.log.EmailLogItem | 0.898982684087% | 295B |16.0Kb | 114.7Mb | 75793 MaKaC.contributionReviewing.Review | 0.887749184867% | 453B | 453B | 113.3Mb |262267 MaKaC.conference.ReportNumberHolder| 0.878300608667% | 117B | 251B | 112.1Mb |984754 MaKaC.registration.AccommodationType | 0.876478211284% | 142B | 438B | 111.9Mb |488509 MaKaC.schedule.ContribSchEntry | 0.873561973452% |
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 6, 2011 at 2:22 PM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: That's hard to do for a project that is already 8 or 9 years old, as you can see in the attached file, we've got have many cases that fall outside your limits. I've noticed, for instance, that pages that involve the loading of 200 MaKaC.review.Abstract objects have an awful performance record (maybe because we then load for each object a handful of other referenced persitent objects). I'd expect load times per persistent objects to vary between 0.1 to 10ms. Over a network connection while sometimes hitting the disk, I'd expect to see an average of 1ms. If you get something awful like an Oracle real application cluster, with virtualization, storage area networks and different data centers involved, you are lucky to see 10ms. If you don't have any of the data in a cache and load hundreds of objects, you very quickly get into the range of one to multiple seconds to load the data. If you need to load more than 1000 objects from the database to render a page, your database schema sucks (tm) ;-) But isn't RelStorage supposed be slower than FileStorage/ZEO? The benchmarks vary on this a little bit. But for read performance they are basically the same. You can do the same tricks like using SSD's or bigger OS disk caches to speed up both of them. RelStorage has simpler (freely available) clustering solutions via the native database and supports things like the memcached cache. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 6, 2011 at 6:19 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello. Is your storage server CPU bound? load average: 1.47, 1.34, 1.20 mpstat: 11:40:13 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 11:40:13 AM all 5.82 0.00 0.69 1.11 0.01 0.12 0.00 92.25 548.52 I guess it's not very high, for an 8 core machine. Nope. None of these stats are high. Starting with ZODB 3.10, ZEO storage servers are multi-threaded. They have a thread for each client. We have a storage server that has run at 120% cpu on a 4-core box. Also, if you use zc.FileStorage, packing is mostly done in a separate process. I didn't know this package even existed. I will give it a try, but in any case our problems don't seem to be related to packing. Nope. I doubt your problems even relate to ZEO. :) But it illustrates that storage servers can benefit from multiple processors. A major source of slow down *can* be disk access times. How's IO wait on your server? As you can see from the mpstat snapshot above, just around 1%. I have checked iostat as well, and the number of transactions per second seems to be very low considering the maximum allowed by the hardware. Look at how your application is using data. If you have requests that have to load a lot of data, maybe you can refactor your application to load fewer. A quick check with nethogs shows values of network usage oscillating between 100 and 200 KB/s. But I guess that if I were loading an excessive amount of data, this value would be higher, no? Right. I'm skeptical that you have a storage problem. What makes you think you have a storage problem? :) Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] How to check for setting the same values on persistent objects?
On Wed, May 4, 2011 at 5:53 AM, Hanno Schlichting ha...@hannosch.eu wrote: Hi. I tried to analyze the overhead of changing content in Plone a bit. It turns out we write back a lot of persistent objects to the database, even tough the actual values of these objects haven't changed. Digging deeper I tried to understand what happens here: 1. persistent.__setattr__ will always set _p_changed to True and thus cause the object to be written back 2. Some BTree buckets define the VALUE_SAME macro. If the macro is available and the new value is the same as the old, the change is ignored 3. The VALUE_SAME macro is only defined for the int, long and float value variants but not the object based ones 4. All code in Products.ZCatalog does explicit comparisons of the old and new value and ignores non-value-changes. I haven't seen any other code doing this. I'm assuming doing a general check for old == new is not safe, as it might not be implemented correctly for all objects and doing the comparison might be expensive. But I'm still curious if we could do something about this. Some ideas: 1. Encourage everyone to do the old == new check in all application code before setting attributes on persistent objects. Pros: This works today, you know what type of values you are dealing with and can be certain when to apply this, you might be able to avoid some computation if you store multiple values based on the same input data Cons: It clutters all code -1 at suggested, but it might be worth asking if there should be changes to infrastructure that encourages lots of spurious attribute updates. 2. Create new persistent base classes which do the checking in their __setattr__ methods Pros: A lot less cluttering in the application code Cons: All applications would need to use the new base classes. Developers might not understand the difference between the variants and use the checking versions, even though they store data which isn't cheap to compare -1. This feels like adding a solution to some other solution. :) 2.a. Create new base classes and do type checking for built-in types Pros: Safer to use than always doing value comparisons Cons: Still separate base classes and overhead of doing type checks ditto 3. Compare object state at the level of the pickled binary data This would need to work at the level of the ZODB connection. When doing savepoints or commits, the registered objects flagged as _p_changed would be checked before being added to the modified list. In order to do this, we need to get the old value of the object, either by loading it again from the database or by keeping a cache of the non-modified state of all objects. The latter could be done in persistent.__setattr__, where we add the pristine state of an object into a separate cache before doing any changes to it. This probably should be a cache with an upper limit, so we avoid running out of memory for connections that change a lot of objects. The cache would only need to hold the binary data and not unpickle it. Pros: On the level of the binary data, the comparisons is rather cheap and safe to do Cons: We either add more database reads or complex change tracking, the change tracking would require more memory for keeping a copy of the pristine object. Interactions with ghosted objects and the new cache could be fragile. There are also possible subtle consistency issues. If an application assigns the same value to a variable and some other transaction assigns a different value, should the 2 conflict? Arguably so. 4. Compare the binary data on the server side Pros: We can get to the old state rather quickly and only need to deal with binary string data Cons: We make all write operations slower, by adding additional read overhead. Especially those which really do change data. This won't work on RelStorage. We only safe disk space and cache invalidations, but still do the bulk of the work and sent data over the network. I probably missed some approaches here. None of the approaches feels like a good solution to me. Doing it server side (4) is a bad idea in my book. Option 3 seems to be the most transparent and safe version, but is also the most complicated to write with all interactions to other caches. It's also not clear what additional responsibilities this would introduce for subclasses of persistent which overwrite various hooks. Maybe option one is the easiest here, but it would need some documentation about this being a best practice. Until now I didn't realize the implications of setting attributes to unchanged values. I think the best approach is to revisit the application infrastructure that's causing all these spurious updates. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list -
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 06, 2011 at 12:19:39PM +0200, Pedro Ferreira wrote: A quick check with nethogs shows values of network usage oscillating between 100 and 200 KB/s. But I guess that if I were loading an excessive amount of data, this value would be higher, no? So throughput isn't the problem. I don't think you've mentioned what the latency is like between your ZEO server and clients. What's the ping time look like? You mentioned pages that load lots of objects (200 objects each referring to a handful of sub-objects). If they're not in the client cache, each of those will require a separate fetch from the ZEO server. As an example, let's say your average object size is only 1k, but due to network latency it takes 5ms per fetch. Let's say your slow page loads 1000 objects, and 100 of them are not in the cache. That's only 100k of data, but you've spent 500ms waiting on the network. If this is indeed the problem, one symptom would be vastly better performance on warm pages when all the needed objects are in the client cache. Is this a Zope 2 app? Have you checked the control panel to see what the cache stats look like? -- Paul Winkler http://www.slinkp.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
Am 06.05.2011, 17:12 Uhr, schrieb Paul Winkler sli...@gmail.com: On Fri, May 06, 2011 at 12:19:39PM +0200, Pedro Ferreira wrote: A quick check with nethogs shows values of network usage oscillating between 100 and 200 KB/s. But I guess that if I were loading an excessive amount of data, this value would be higher, no? So throughput isn't the problem. I don't think you've mentioned what the latency is like between your ZEO server and clients. What's the ping time look like? You mentioned pages that load lots of objects (200 objects each referring to a handful of sub-objects). If they're not in the client cache, each of those will require a separate fetch from the ZEO server. As an example, let's say your average object size is only 1k, but due to network latency it takes 5ms per fetch. Let's say your slow page loads 1000 objects, and 100 of them are not in the cache. That's only 100k of data, but you've spent 500ms waiting on the network. It would be cool if you could give a hint to ZEO somehow to prefetch a certain set of objects along with their subobjects and then return everything in one batch. This way you avoid all the round-trips when you discover you want to retrieve a suboject. -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 06, 2011 at 05:19:25PM +0200, Matthias wrote: It would be cool if you could give a hint to ZEO somehow to prefetch a certain set of objects along with their subobjects and then return everything in one batch. This way you avoid all the round-trips when you discover you want to retrieve a suboject. I've seen this proposed several times over the years, but AFAIK nobody has implemented it yet. For example: http://www.mail-archive.com/zodb-dev@zope.org/msg04107.html -- Paul Winkler http://www.slinkp.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
It would be cool if you could give a hint to ZEO somehow to prefetch a certain set of objects along with their subobjects and then return everything in one batch. This way you avoid all the round-trips when you discover you want to retrieve a suboject. +1 But I guess that could be tricky, as it's common to have references to parent objects, etc... that would have to be ignored. Batch fetching would be easier though :) Cheers, Pedro ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On 05/06/2011 06:22 AM, Pedro Ferreira wrote: But isn't RelStorage supposed be slower than FileStorage/ZEO? No, every measurement I've tried suggests RelStorage (with PostgreSQL or MySQL) is faster than ZEO on the same hardware. ZEO has certainly gotten faster lately, but RelStorage still seems to have the advantage AFAICT. OTOH, the speed difference is not dramatic. For many apps it's not even noticeable. But remember that throwing more caches at the problem isn't a solution. It's likely the way you store or query the data from the database that's not optimal. I agree, many things could be improved regarding the data structures we use. However, it is also true that we have a large number of objects that are rarely changed, and that there is no need to fetch from the DB if we can keep them in memory. It sounds like you primarily need a bigger and faster cache. If you want to make minimal changes to your setup, try increasing the size of your ZEO cache and store the ZEO cache on either a RAM disk (try mount -t tmpfs none /some/path) or a solid state disk. Remember that seek time is 5-10 ms with spinning drives, so putting a ZEO cache on a spinning drive can actually kill performance. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 6, 2011 at 10:14 PM, Shane Hathaway sh...@hathawaymix.org wrote: From my experience, most people who want ZODB to be faster want Zope catalogs in particular to be faster. I don't think prefetching can make catalogs much faster, though. I've spent a lot of time lately on making ZCatalog faster. The main trick there is to store data in smarter ways, load fewer objects in the first place and trying to minimize data sets as early as possible, so the cost of intersection() and union() gets lower. There's a lot more you can do about optimizing ZCatalog, but prefetching would indeed not help much. The only cases where you could do prefetching are the ones you don't want to do anyways, like loading an entire BTree or TreeSet, because you need to do a len(tree) or actually iterate over the entire thing. All that said, if you hit large datasets, it gets problematic to do catalog operations on each Zope client. At some point a centralized query approach on the server side or via a web API wins in terms of overall resource efficiency. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On Fri, May 6, 2011 at 4:21 PM, Shane Hathaway sh...@hathawaymix.org wrote: On 05/06/2011 02:14 PM, Jim Fulton wrote: It sounds like you primarily need a bigger and faster cache. If you want to make minimal changes to your setup, try increasing the size of your ZEO cache and store the ZEO cache on either a RAM disk (try mount -t tmpfs none /some/path) or a solid state disk. Remember that seek time is 5-10 ms with spinning drives, so putting a ZEO cache on a spinning drive can actually kill performance. If this on Linux and you have enough RAM, the data should be in the disk cache anyway, so I don't see any benefit to a RAM disk. If there is memory pressure then Linux will evict some of the cache from RAM, causing the ZEO cache to be much slower than the ZEO server. I've seen that happen often. If there is memory pressure and you take away ram for a ram disk, then you're going to start swapping, which will give you other problems. mount -t tmpfs is an easy solution that has been available widely for a long time now. You can even allow non-root users to do it by changing /etc/fstab. I tried running ZODB tests off of a RAM disk created that way and got lots of strange failures. :( Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On 05/06/2011 02:38 PM, Jim Fulton wrote: If there is memory pressure and you take away ram for a ram disk, then you're going to start swapping, which will give you other problems. In my experience, Linux moves pieces of the ZEO cache out of RAM long before it starts swapping much. I tried running ZODB tests off of a RAM disk created that way and got lots of strange failures. :( Hmm, ok. If that's still the case then SSD is a better option. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Speeding up ZODB (was redis cache for RelStorage)
On 05/06/2011 02:14 PM, Shane Hathaway wrote: However, there is a different class of problems that prefetching could help solve. Let's say you have pages with a lot of little pieces on it, such as a comment page with a profile image for every comment. It would be useful to tell ZODB to load all of the objects in a list before accessing them. For example: app._p_jar.load_all(comments) app._p_jar.load_all([comment.profile for comment in comments]) FWIW, a better API name would be activate_all. Code would look like app._p_jar.activate_all(my_objects). Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev