Hi. I tried to analyze the overhead of changing content in Plone a bit. It turns out we write back a lot of persistent objects to the database, even tough the actual values of these objects haven't changed.
Digging deeper I tried to understand what happens here: 1. persistent.__setattr__ will always set _p_changed to True and thus cause the object to be written back 2. Some BTree buckets define the "VALUE_SAME" macro. If the macro is available and the new value is the same as the old, the change is ignored 3. The VALUE_SAME macro is only defined for the int, long and float value variants but not the object based ones 4. All code in Products.ZCatalog does explicit comparisons of the old and new value and ignores non-value-changes. I haven't seen any other code doing this. I'm assuming doing a general check for "old == new" is not safe, as it might not be implemented correctly for all objects and doing the comparison might be expensive. But I'm still curious if we could do something about this. Some ideas: 1. Encourage everyone to do the old == new check in all application code before setting attributes on persistent objects. Pros: This works today, you know what type of values you are dealing with and can be certain when to apply this, you might be able to avoid some computation if you store multiple values based on the same input data Cons: It clutters all code 2. Create new persistent base classes which do the checking in their __setattr__ methods Pros: A lot less cluttering in the application code Cons: All applications would need to use the new base classes. Developers might not understand the difference between the variants and use the "checking" versions, even though they store data which isn't cheap to compare 2.a. Create new base classes and do type checking for built-in types Pros: Safer to use than always doing value comparisons Cons: Still separate base classes and overhead of doing type checks 3. Compare object state at the level of the pickled binary data This would need to work at the level of the ZODB connection. When doing savepoints or commits, the registered objects flagged as _p_changed would be checked before being added to the modified list. In order to do this, we need to get the old value of the object, either by loading it again from the database or by keeping a cache of the non-modified state of all objects. The latter could be done in persistent.__setattr__, where we add the pristine state of an object into a separate cache before doing any changes to it. This probably should be a cache with an upper limit, so we avoid running out of memory for connections that change a lot of objects. The cache would only need to hold the binary data and not unpickle it. Pros: On the level of the binary data, the comparisons is rather cheap and safe to do Cons: We either add more database reads or complex change tracking, the change tracking would require more memory for keeping a copy of the pristine object. Interactions with ghosted objects and the new cache could be fragile. 4. Compare the binary data on the server side Pros: We can get to the old state rather quickly and only need to deal with binary string data Cons: We make all write operations slower, by adding additional read overhead. Especially those which really do change data. This won't work on RelStorage. We only safe disk space and cache invalidations, but still do the bulk of the work and sent data over the network. I probably missed some approaches here. None of the approaches feels like a good solution to me. Doing it server side (4) is a bad idea in my book. Option 3 seems to be the most transparent and safe version, but is also the most complicated to write with all interactions to other caches. It's also not clear what additional responsibilities this would introduce for subclasses of persistent which overwrite various hooks. Maybe option one is the easiest here, but it would need some documentation about this being a best practice. Until now I didn't realize the implications of setting attributes to unchanged values. Hanno _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev