On Wed, May 4, 2011 at 5:53 AM, Hanno Schlichting <ha...@hannosch.eu> wrote: > Hi. > > I tried to analyze the overhead of changing content in Plone a bit. It > turns out we write back a lot of persistent objects to the database, > even tough the actual values of these objects haven't changed. > > Digging deeper I tried to understand what happens here: > > 1. persistent.__setattr__ will always set _p_changed to True and thus > cause the object to be written back > 2. Some BTree buckets define the "VALUE_SAME" macro. If the macro is > available and the new value is the same as the old, the change is > ignored > 3. The VALUE_SAME macro is only defined for the int, long and float > value variants but not the object based ones > 4. All code in Products.ZCatalog does explicit comparisons of the old > and new value and ignores non-value-changes. I haven't seen any other > code doing this. > > I'm assuming doing a general check for "old == new" is not safe, as it > might not be implemented correctly for all objects and doing the > comparison might be expensive. > > But I'm still curious if we could do something about this. Some ideas: > > 1. Encourage everyone to do the old == new check in all application > code before setting attributes on persistent objects. > > Pros: This works today, you know what type of values you are dealing > with and can be certain when to apply this, you might be able to avoid > some computation if you store multiple values based on the same input > data > Cons: It clutters all code
-1 at suggested, but it might be worth asking if there should be changes to infrastructure that encourages lots of spurious attribute updates. > 2. Create new persistent base classes which do the checking in their > __setattr__ methods > > Pros: A lot less cluttering in the application code > Cons: All applications would need to use the new base classes. > Developers might not understand the difference between the variants > and use the "checking" versions, even though they store data which > isn't cheap to compare -1. This feels like adding a solution to some other solution. :) > > 2.a. Create new base classes and do type checking for built-in types > > Pros: Safer to use than always doing value comparisons > Cons: Still separate base classes and overhead of doing type checks ditto > > 3. Compare object state at the level of the pickled binary data > > This would need to work at the level of the ZODB connection. When > doing savepoints or commits, the registered objects flagged as > _p_changed would be checked before being added to the modified list. > In order to do this, we need to get the old value of the object, > either by loading it again from the database or by keeping a cache of > the non-modified state of all objects. The latter could be done in > persistent.__setattr__, where we add the pristine state of an object > into a separate cache before doing any changes to it. This probably > should be a cache with an upper limit, so we avoid running out of > memory for connections that change a lot of objects. The cache would > only need to hold the binary data and not unpickle it. > > Pros: On the level of the binary data, the comparisons is rather cheap > and safe to do > Cons: We either add more database reads or complex change tracking, > the change tracking would require more memory for keeping a copy of > the pristine object. Interactions with ghosted objects and the new > cache could be fragile. There are also possible subtle consistency issues. If an application assigns the same value to a variable and some other transaction assigns a different value, should the 2 conflict? Arguably so. > 4. Compare the binary data on the server side > > Pros: We can get to the old state rather quickly and only need to deal > with binary string data > Cons: We make all write operations slower, by adding additional read > overhead. Especially those which really do change data. This won't > work on RelStorage. We only safe disk space and cache invalidations, > but still do the bulk of the work and sent data over the network. > > > I probably missed some approaches here. None of the approaches feels > like a good solution to me. Doing it server side (4) is a bad idea in > my book. Option 3 seems to be the most transparent and safe version, > but is also the most complicated to write with all interactions to > other caches. It's also not clear what additional responsibilities > this would introduce for subclasses of persistent which overwrite > various hooks. > > Maybe option one is the easiest here, but it would need some > documentation about this being a best practice. Until now I didn't > realize the implications of setting attributes to unchanged values. I think the best approach is to revisit the application infrastructure that's causing all these spurious updates. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev