On 4 May 2011 10:53, Hanno Schlichting <ha...@hannosch.eu> wrote: > Hi. > > I tried to analyze the overhead of changing content in Plone a bit. It > turns out we write back a lot of persistent objects to the database, > even tough the actual values of these objects haven't changed. > > Digging deeper I tried to understand what happens here: > > 1. persistent.__setattr__ will always set _p_changed to True and thus > cause the object to be written back > 2. Some BTree buckets define the "VALUE_SAME" macro. If the macro is > available and the new value is the same as the old, the change is > ignored > 3. The VALUE_SAME macro is only defined for the int, long and float > value variants but not the object based ones > 4. All code in Products.ZCatalog does explicit comparisons of the old > and new value and ignores non-value-changes. I haven't seen any other > code doing this. > > I'm assuming doing a general check for "old == new" is not safe, as it > might not be implemented correctly for all objects and doing the > comparison might be expensive. > > But I'm still curious if we could do something about this. Some ideas: > > 1. Encourage everyone to do the old == new check in all application > code before setting attributes on persistent objects. > > Pros: This works today, you know what type of values you are dealing > with and can be certain when to apply this, you might be able to avoid > some computation if you store multiple values based on the same input > data > Cons: It clutters all code > > 2. Create new persistent base classes which do the checking in their > __setattr__ methods > > Pros: A lot less cluttering in the application code > Cons: All applications would need to use the new base classes. > Developers might not understand the difference between the variants > and use the "checking" versions, even though they store data which > isn't cheap to compare > > 2.a. Create new base classes and do type checking for built-in types > > Pros: Safer to use than always doing value comparisons > Cons: Still separate base classes and overhead of doing type checks > > 3. Compare object state at the level of the pickled binary data > > This would need to work at the level of the ZODB connection. When > doing savepoints or commits, the registered objects flagged as > _p_changed would be checked before being added to the modified list. > In order to do this, we need to get the old value of the object, > either by loading it again from the database or by keeping a cache of > the non-modified state of all objects. The latter could be done in > persistent.__setattr__, where we add the pristine state of an object > into a separate cache before doing any changes to it. This probably > should be a cache with an upper limit, so we avoid running out of > memory for connections that change a lot of objects. The cache would > only need to hold the binary data and not unpickle it. > > Pros: On the level of the binary data, the comparisons is rather cheap > and safe to do > Cons: We either add more database reads or complex change tracking, > the change tracking would require more memory for keeping a copy of > the pristine object. Interactions with ghosted objects and the new > cache could be fragile. > > 4. Compare the binary data on the server side > > Pros: We can get to the old state rather quickly and only need to deal > with binary string data > Cons: We make all write operations slower, by adding additional read > overhead. Especially those which really do change data. This won't > work on RelStorage. We only safe disk space and cache invalidations, > but still do the bulk of the work and sent data over the network. > > > I probably missed some approaches here. None of the approaches feels > like a good solution to me. Doing it server side (4) is a bad idea in > my book. Option 3 seems to be the most transparent and safe version, > but is also the most complicated to write with all interactions to > other caches. It's also not clear what additional responsibilities > this would introduce for subclasses of persistent which overwrite > various hooks. > > Maybe option one is the easiest here, but it would need some > documentation about this being a best practice. Until now I didn't > realize the implications of setting attributes to unchanged values.
Persistent objects are also used as a cache and in that case code relies on an object being invalidated to ensure its _v_ attributes are cleared. Comparing at the pickle level would break these caches. I suspect that this is only really a problem for the catalogue. Content objects will always change on the pickle level when they are invalidated as they will have their modification date updated. I imagine you also see archetypes doing bad things as it tends to store one persistent object per field, but that is just bad practise. It would be interesting to see the performance impact of adding newvalue != oldvalue checks on the catalogue data structures. This would also prevent the unindex logic being called unnecessarily. I don't think that the dobbin requirement to explicitly checkout/checking objects would be very helpful either - the same if newvalue != oldvalue logic would be required to avoid unnecessary changes. Laurence _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev